[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114969417B - Image reordering method, related device and computer readable storage medium - Google Patents

Image reordering method, related device and computer readable storage medium Download PDF

Info

Publication number
CN114969417B
CN114969417B CN202210475225.7A CN202210475225A CN114969417B CN 114969417 B CN114969417 B CN 114969417B CN 202210475225 A CN202210475225 A CN 202210475225A CN 114969417 B CN114969417 B CN 114969417B
Authority
CN
China
Prior art keywords
image
feature
text
distance
fused
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210475225.7A
Other languages
Chinese (zh)
Other versions
CN114969417A (en
Inventor
郝磊
许松岑
李炜棉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210475225.7A priority Critical patent/CN114969417B/en
Publication of CN114969417A publication Critical patent/CN114969417A/en
Application granted granted Critical
Publication of CN114969417B publication Critical patent/CN114969417B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5862Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the field of image retrieval in the computer vision technology in the field of artificial intelligence, and provides an image reordering method, related equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring an image to be inquired; extracting image features of an image to be queried, and retrieving in an image database according to the image features to obtain an initial retrieval result; mapping a first image feature and a first text feature corresponding to each image in the initial retrieval result to the same target feature space to obtain a second image feature and a second text feature; remapping the second image characteristic to an image characteristic space, remapping the second text characteristic to a text characteristic space, and obtaining a fusion image characteristic and a fusion text characteristic which correspond to each image; and rearranging the initial retrieval result based on the fusion image characteristic and/or the fusion text characteristic corresponding to each image to obtain a final retrieval result. By implementing the method and the device, the accuracy of the retrieval result can be improved.

Description

Image reordering method, related device and computer readable storage medium
This application is a divisional application, filed as original application with application number 202011012034.4, filed as 09/23/2020, the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of image retrieval, and in particular, to an image reordering method, a related device, and a computer-readable storage medium.
Background
In brief, image reordering is to reorder the results returned by the image search engine by using the feature information contained in the image, so as to obtain a search result more satisfactory to the user. Generally, the feature information of an image includes a text feature of the image and a visual feature of the image. Each feature may be referred to as a modality.
Currently, image reordering algorithms can be classified into three categories: classification-based image reordering, cluster-based image reordering, and graph model-based image reordering. Taking the example of classification-based image reordering, in a classification-based algorithm, it is assumed that the top ranked images are relevant to the query and the bottom ranked images are not relevant to the query in the search results returned by the search engine. The algorithm uses these images as training samples, trains a two-classifier to determine whether an image is relevant to the query, and then re-ranks the images using the image classification probability as a ranking score for the images. Clustering-based algorithms mine some potential patterns of images that are relevant or irrelevant to a query by clustering and then use these potential patterns for reordering. The image set is constructed into a graph by an algorithm based on a graph model, the nodes of the graph are all images, and the edges between the nodes measure the similarity of the images. Image ordering is then performed using some link analysis technique.
The image sorting algorithm only considers a single image modality, and the problem of inaccurate sorting result is easily caused. Therefore, how to improve the accuracy of the sequencing result is an urgent technical problem to be solved.
Disclosure of Invention
The application provides an image reordering method, related equipment and a computer readable storage medium, which can improve the accuracy of a retrieval result.
In a first aspect, a method for reordering images is provided, which may include the steps of: firstly, acquiring an image to be inquired; secondly, extracting the image characteristics of the image to be inquired through an image characteristic extraction network, and searching in an image database according to the image characteristics to obtain an initial searching result; wherein, the initial search result may include N images; the N images are arranged according to the feature similarity from high to low; and each of the N images respectively comprises a first image feature for representing the color, texture, shape and spatial relationship of the image and a first text feature for representing the text information of the image; n is an integer greater than 0; the first image feature is a feature in an image feature space; the first text feature is a feature in a text feature space; then, mapping a first image feature and a first text feature corresponding to each image in the initial retrieval result to the same target feature space to obtain a second image feature and a second text feature; here, the second image feature and the second text feature have a close-proximity relationship with other features between different modalities; then, remapping the second image characteristic to the image characteristic space to obtain a fusion image characteristic corresponding to each image; remapping the second text feature mapping to the text feature space to obtain a fused text feature corresponding to each image, for example, the fused image feature includes a part of image feature (for example, a third image feature) and a part of text feature (for example, a third text feature), and the proportion of the third image feature in the fused image feature is higher than that of the third text feature; the fused text feature comprises a part of the image feature (for example, a fourth image feature) and a part of the text feature (for example, a fourth text feature), and the proportion of the fourth text feature in the fused text feature is higher than that of the fourth image feature; furthermore, the expression form of the fused image feature can enable the fused image feature to have a neighbor relation with other image features in the same modality; the expression form of the fusion text feature can enable the fusion text feature to have a neighbor relation with other text features in the same mode; and finally, reordering the initial retrieval results based on the fusion image features and/or fusion text features corresponding to each image to obtain the final retrieval result.
By implementing the embodiment of the application, because the obtained second image feature and the second text feature consider the neighbor relation between the modalities in the target feature space, the obtained fusion image feature and the fusion text feature can keep the neighbor relation in the original space, and therefore, when the initial retrieval result is reordered based on the fusion image feature and the fusion text feature corresponding to each image, the accuracy of the retrieval result can be improved. In the prior art, under the condition that a user is not satisfied with a final retrieval result, a computer device is often required to perform multiple times of retrieval to obtain a retrieval result with high accuracy, and the implementation mode needs to consume a large amount of resources, such as computing resources, of the device. Compared with the prior art, the method and the device have the advantages that due to the fact that the accuracy of the retrieval result is high, the computer equipment is not required to conduct retrieval for multiple times, and resource consumption of the computer equipment can be reduced.
In one possible implementation manner, the implementation process of reordering the initial search results based on the fused image features and the fused text features corresponding to each image may include: firstly, in N +1 images, determining the distance between every two images according to the fusion image characteristics and the fusion text characteristics contained in every two images, wherein the N +1 images comprise an image to be inquired and N images in an initial retrieval result; then, determining a K mutual neighbor relation corresponding to each image in the N +1 images based on the determined distance, wherein the K mutual neighbor relation is used for representing that the image a is a K neighbor of the image b and the image b is also the K neighbor of the image a; then, calculating the Jacard Jaccard distance between the image to be inquired and each image in the initial retrieval result according to the K mutual neighbor relation; the Jacard distance is an index for measuring the difference between two sets, and is a complementary set of Jacard similarity coefficients defined by 1 minus Jaccard similarity coefficients. Jaccard similarity coefficient (Jaccard similarity coefficient), also known as Jaccard Index (Jaccard Index), is an Index used to measure the similarity between two sets. And finally, reordering the initial retrieval results according to the Jaccard distance. By implementing the embodiment of the application, each image comprises the corresponding fusion image characteristic and the corresponding fusion text characteristic, the fusion image characteristic and the fusion text characteristic represent the content contained in the same image, and when the distance between every two images is determined, the distance can be determined according to the fusion image characteristic and the fusion text characteristic respectively contained in every two images, so that the initial retrieval result can be reordered based on the determined distance, the accuracy of the ordering result can be improved, and the undesirable retrieval result in the final retrieval result is avoided.
In one possible implementation, for the ith and jth images of the N +1 images; the ith image comprises a fused image feature X and a fused text feature Y; the jth image comprises a fused image feature P and a fused text feature Q; the implementation process of determining the distance between two images according to the fusion image feature and the fusion text feature included in each of the two images may include: firstly, determining a first distance according to a fusion image characteristic X and a fusion image characteristic P; determining a second distance according to the fusion text characteristic Y and the fusion text characteristic Q; determining a third distance according to the fusion image characteristic X and the fusion text characteristic Q; determining a fourth distance according to the fusion text characteristic Y and the fusion image characteristic P; then, the distance between the ith image and the jth image is determined according to the first distance, the second distance, the third distance and the fourth distance.
In one possible implementation, the first distance and the second distance are used to characterize the distance of the ith image and the jth image within the same modality; the third distance and the fourth distance are used to characterize the distance between the ith image and the jth image in different modalities. In the implementation mode, when the distance between every two images is determined, the distance between every two images in the same modality and the distance between every two images in different modalities are fully considered, so that compared with the prior art, the accuracy of the retrieval result can be improved, in addition, the repeated retrieval of the computer equipment can be avoided, and the resource consumption of the computer equipment can be reduced.
In one possible implementation manner, the process of reordering the initial search results based on the fused image features corresponding to each image may include: firstly, sequentially acquiring the similarity between the fusion image characteristics corresponding to the image to be inquired and the fusion image characteristics corresponding to each image in the initial retrieval result; and secondly, reordering the initial retrieval results according to the determined similarity. By implementing the method and the device, the initial retrieval result can be reordered according to the similarity between the fusion image feature corresponding to the image to be queried and the fusion image feature corresponding to each image in the initial retrieval result, the accuracy of the ordering result can be improved, and the situation that the final retrieval result has an undesirable retrieval result is avoided.
In one possible implementation manner, the implementation process of reordering the initial search results based on the fused text features corresponding to each image may include: firstly, sequentially acquiring the similarity between the fusion text features corresponding to the image to be inquired and the fusion text features corresponding to each image in the initial retrieval result; and then, reordering the initial retrieval results according to the determined similarity. By implementing the embodiment of the application, the initial retrieval results can be reordered according to the similarity between the fusion text features corresponding to the images to be queried and the fusion text features corresponding to each image in the initial retrieval results, so that the accuracy of the ordering results can be improved, and the final retrieval results are prevented from being unsatisfactory.
In a possible implementation manner, before determining a distance between two images in the N +1 images according to a fused image feature and a fused text feature included in each of the two images, the method may further include the following steps: carrying out weighted average on first text features corresponding to the previous L images in the initial retrieval result to obtain fusion text features corresponding to the image to be inquired; l is an integer greater than 0 and less than N. According to the method and the device for searching the text features of the image to be searched, the image to be searched only has the corresponding image features and does not have the corresponding text features, the first text features corresponding to the previous L images in the initial search result are weighted and averaged through the implementation mode, the text features of the image to be searched can be obtained, and convenience is brought to the follow-up calculation of the distance between every two images.
In a second aspect, an embodiment of the present application provides an image reordering apparatus, which may include: the image acquisition unit is used for acquiring an image to be inquired; the first retrieval unit is used for extracting the image characteristics of the image to be queried and retrieving in the image database according to the image characteristics to obtain an initial retrieval result; wherein, the initial search result comprises N images; arranging the N images from high to low according to the feature similarity; each image in the N images respectively comprises first image features used for representing the color, texture, shape and spatial relation of the image and first text features used for representing the text information of the image; n is an integer greater than 0; the first image feature is a feature in an image feature space; the first text feature is a feature in a text feature space; the first feature mapping unit is used for mapping the first image feature and the first text feature corresponding to each image in the initial retrieval result to the same target feature space to obtain a second image feature and a second text feature; the second feature mapping unit is used for remapping the second image features to the image feature space to obtain fused image features corresponding to each image; remapping the second text characteristic to the text characteristic space to obtain a fusion text characteristic corresponding to each image; the fusion image features have a neighbor relation with other image features in the same modality; the fusion text features have a neighbor relation with other text features in the same mode; one feature type is used to characterize a modality; and the second retrieval unit is used for reordering the initial retrieval results based on the fused image features and/or the fused text features corresponding to each image to obtain the final retrieval results.
In one possible implementation, the fused image feature includes a third image feature and a third text feature, and in the fused image feature, the occupation ratio of the third image feature is higher than that of the third text feature; the fusion text feature comprises a fourth image feature and a fourth text feature, and the proportion of the fourth text feature in the fusion text feature is higher than that of the fourth image feature.
In a possible implementation manner, the second retrieval unit comprises a distance calculation unit and a reordering unit, wherein the distance calculation unit is used for determining the distance between every two images in the N +1 images according to the fusion image characteristics and the fusion text characteristics respectively contained in the every two images; the N +1 images comprise an image to be inquired and N images in the initial retrieval result; a reordering unit, configured to determine, based on the determined distance, a K mutual neighbor relation corresponding to each of the N +1 images, where the K mutual neighbor relation is used to represent that an image a is a K neighbor of an image b, and the image b is also a K neighbor of the image a; calculating the Jacard Jaccard distance between the image to be inquired and each image in the initial retrieval result according to the K mutual neighbor relation; and reordering the initial retrieval result according to the Jaccard distance.
In one possible implementation, for the ith and jth images of the N +1 images; the ith image comprises a fused image feature X and a fused text feature Y; the jth image comprises a fusion image feature P and a fusion text feature Q; the distance calculation unit is specifically configured to: determining a first distance according to the fusion image characteristic X and the fusion image characteristic P; determining a second distance according to the fusion text characteristic Y and the fusion text characteristic Q; determining a third distance according to the fusion image characteristic X and the fusion text characteristic Q; determining a fourth distance according to the fusion text characteristic Y and the fusion image characteristic P; and determining the distance between the ith image and the jth image according to the first distance, the second distance, the third distance and the fourth distance.
In one possible implementation, the first distance and the second distance are used to characterize the distance of the ith image and the jth image within the same modality; the third distance and the fourth distance are used for representing the distance between different modalities of the ith image and the jth image.
In a possible implementation manner, the second retrieval unit comprises a feature similarity calculation unit and a reordering unit, wherein the feature similarity calculation unit is used for sequentially obtaining the similarity between the fusion image features corresponding to the image to be queried and the fusion image features corresponding to each image in the initial retrieval result; and the reordering unit is used for reordering the initial retrieval results according to the determined similarity.
In a possible implementation manner, the second retrieval unit comprises a feature similarity calculation unit and a reordering unit, wherein the feature similarity calculation unit is used for sequentially obtaining the similarity between the fusion text features corresponding to the image to be queried and the fusion text features corresponding to each image in the initial retrieval result; and the reordering unit is used for reordering the initial retrieval results according to the determined similarity.
In one possible implementation, the apparatus further includes: the feature extraction unit is used for carrying out weighted average on first text features corresponding to the previous L images in the initial retrieval result to obtain fused text features corresponding to the image to be inquired; l is an integer greater than 0 and less than N.
In a third aspect, an embodiment of the present application further provides an image reordering apparatus, which may include a memory for storing a computer program that supports an apparatus to execute the above method, and a processor, where the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method of the first aspect.
In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions, which, when executed by a processor, cause the processor to perform the method of the first aspect.
In a fifth aspect, embodiments of the present application further provide a computer program, where the computer program includes computer software instructions, and the computer software instructions, when executed by a computer, cause the computer to perform any one of the collaborative methods according to the first aspect.
Drawings
Fig. 1a is a schematic diagram of a first application scenario provided in an embodiment of the present application;
fig. 1b is a schematic diagram of a second application scenario provided in the embodiment of the present application;
FIG. 2a is a schematic structural diagram of a multi-modal fusion model 20 according to an embodiment of the present application;
fig. 2b is a schematic structural diagram of a Resnet50 model according to an embodiment of the present application;
fig. 3a is a schematic flowchart of an image reordering method according to an embodiment of the present disclosure;
fig. 3b is a schematic diagram of obtaining an initial search result according to an embodiment of the present disclosure;
fig. 3c is a schematic diagram of obtaining a final search result according to an embodiment of the present disclosure;
FIG. 3d is a schematic diagram of a search according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an image reordering device 40 according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an image reordering device 50 according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application are clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
The terms "first" and "second" and the like in the specification and drawings of the present application are used for distinguishing different objects or for distinguishing different processes for the same object, and are not used for describing a specific order of the objects. Furthermore, the terms "including" and "having," and any variations thereof, as referred to in the description of the present application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. It should be noted that in the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or descriptions. Any embodiment or design method described herein as "exemplary" or "such as" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion. In the examples of the present application, "A and/or B" means both A and B, and A or B. "A, and/or B, and/or C" means any one of A, B, C, or means any two of A, B, C, or means A and B and C.
In order to better understand the technical solutions described in the present application, the following first explains the related terms related to the embodiments of the present application:
(1) Same object image retrieval
In this embodiment of the present application, the same object image retrieval refers to querying an object in an image, and finding an image containing the object from an image database. For example, given a "Mona Lisa" image, the goal of the same object image retrieval is to retrieve an image from the image database that contains the person of the "Mona Lisa". Specifically, the images containing the "Mona Lisa" person are ranked as far as possible in front of the search result after being ranked by the similarity measure.
(2) Same category image retrieval
In the embodiment of the present application, the same category image retrieval, also called similar object image retrieval, refers to finding out an image belonging to the same category as a given query image from an image database.
(3) Image reordering
In the embodiment of the application, the image reordering is to reorder the results returned by the image search engine by using the feature information contained in the image, so as to obtain the search results which are more satisfactory to the user.
In order to facilitate a better understanding of the present application, the following presents several application scenarios to which the method described in the present application can be applied:
a first application scenario: similar image retrieval
As shown in fig. 1a, a plurality of applications are displayed on a display interface of the electronic device, and when a user performs a touch operation (e.g., a click operation, a press operation, a slide operation, etc.) on a "browser" application 201, the electronic device displays a search box of the browser application 201 (e.g., as shown in a portion b of fig. 1 a). The user inputs a Mona Lisa image in a search box of the browser application program, the electronic equipment searches the image database for the image containing the Mona Lisa character according to the feature similarity, and an initial search result is obtained. Because each image in the initial retrieval result often contains the first image feature and the first text feature, by the method described in the present application, the fused image feature and the fused text feature corresponding to each image can be obtained through the multi-modal fusion model 20, wherein the fused image feature includes the third image feature and the third text feature, and the occupation ratio of the third image feature in the fused image feature is higher than that of the third text feature, so that the fused image feature and other image features in the same modality have a neighbor relation; the fused text feature comprises a fourth image feature and a fourth text feature, and the proportion of the fourth text feature in the fused text feature is higher than that of the fourth image feature, so that the fused text feature and other text features in the same modality can have a neighbor relation. The initial search results are then reordered based on the image fusion features and/or the fused text features, returning a final search result that is more desirable to the user, e.g., a Mona Lisa image as shown in section c of FIG. 1 a.
A second application scenario: commodity retrieval
As shown in fig. 1b, a plurality of applications are displayed on a display interface of the electronic device, and a user turns on a camera (for example, as shown in a part a of fig. 1b, the user performs a touch operation on the camera application), and photographs a target item (for example, hua mobile phone nova 7) through the camera (for example, as shown in a part b of fig. 1 b) to perform retrieval according to the target item. And the electronic equipment searches out the commodities similar to the target object according to the characteristic similarity to obtain an initial search result. Because each image in the initial retrieval result often contains the first image feature and the first text feature, by the method described in the present application, the fused image feature and the fused text feature corresponding to each image can be obtained through the multi-modal fusion model 20, wherein the fused image feature includes the third image feature and the third text feature, and the occupation ratio of the third image feature in the fused image feature is higher than that of the third text feature, so that the fused image feature and other image features in the same modality have a neighbor relation; the fused text feature comprises a fourth image feature and a fourth text feature, and the occupation ratio of the fourth text feature in the fused text feature is higher than that of the fourth image feature, so that the fused text feature and other text features in the same modality can have a neighbor relation. Then, the initial search results are reordered based on the image fusion features and/or the fusion text features, and final search results more consistent with the user expectation are returned, for example, links of mobile phone commodities are shown as part c in fig. 1 b.
The following describes in detail the specific structure of the multimodal fusion model to which the present application relates. As shown in FIG. 2a, the multimodal fusion model 20 includes an image feature extraction network 210, a text feature extraction network 220, an encoder 230, an image feature decoder 240, and a text feature decoder 250.
The image feature extraction network 210 is configured to extract an image feature of an input image to obtain a first image feature; wherein the first image feature is a feature in an image feature space; the text feature extraction network 220 is configured to extract a text feature of an input text to obtain a first text feature; wherein the first text feature is a feature in a text feature space. The first image feature and the first text feature are then input into the same encoder 230. The encoder 230 is configured to map the first image feature and the first text feature into a same target feature space, so as to obtain a second image feature and a second text feature respectively. Illustratively, the encoder 230 is a parameter-sharing structure. Then, the second image feature is remapped to the image feature space through the image feature decoder 240, so as to obtain a fused image feature; the fusion image feature comprises a third image feature and a third text feature, and the occupation ratio of the third image feature in the fusion image feature is higher than that of the third text feature, so that the fusion image feature and other image features in the same modality can have a neighbor relation. Remapping the second text feature to a text feature space by a text feature decoder 250 to obtain a fused text feature; and the fusion text feature comprises a fourth image feature and a fourth text feature, wherein the occupation ratio of the fourth text feature in the fusion text feature is higher than that of the fourth image feature, so that the fusion text feature and other text features in the same modality can have a neighbor relation. Here, the image feature decoder 240 and the text feature decoder 250 are all full connection layer structures. The fused image features and/or the fused text features obtained through the multimodal fusion model 20 may be used to reorder the initial search results to obtain the final search results.
Illustratively, the image feature extraction network 210 may be the Resnet50 model and the text feature extraction network 220 may be the Word2Vec model. These are explained below:
(1) Resnet50 model
In particular, the Resnet50 model may be built based on a convolutional neural network, including multiple convolutional layers and multiple pooling layers. As shown in fig. 2b, the Resnet50 model may include layers as in examples 221-226, for example: in one implementation, 221 is a convolutional layer, 222 is a pooling layer, 223 is a convolutional layer, 224 is a pooling layer, 225 is a convolutional layer, 226 is a pooling layer; in another implementation, 221, 222 are convolutional layers, 223 is a pooling layer, 224, 225 are convolutional layers, and 226 is a pooling layer. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.
The internal operation of a convolutional layer will be described below by taking convolutional layer 221 as an example.
Convolution layer 221 may include a number of convolution operators, also called kernels, whose role in image processing is to act as a filter for extracting specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed on the input image pixel by pixel (or two pixels by two pixels, depending on the value of the step size stride) in the horizontal direction, so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of matrices of the same type, are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by "plurality" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the feature maps extracted by the plurality of weight matrices having the same size also have the same size, and the extracted feature maps having the same size are combined to form the output of the convolution operation.
The weight values in the weight matrixes need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can be used for extracting information from an input image, so that the convolutional neural network can carry out correct prediction.
When the convolutional neural network has multiple convolutional layers, the initial convolutional layer (e.g. 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network increases, the more backward convolutional layer (e.g., 226) extracts more and more complex features, such as features with high levels of semantics, the more semantic features are suitable for the problem to be solved.
A pooling layer:
since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, either one layer of convolutional layer followed by one pooling layer or multiple layers of convolutional layer followed by one or more pooling layers as illustrated in fig. 2b for each layer 221-226.
Specifically, the pooling layer is used for sampling data and reducing the number of data. For example, taking data as image data as an example, in the image processing process, the spatial size of the image can be reduced by the pooling layer. In general, the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as a result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.
(2) Word2Vec model
Specifically, the Word2Vec model is a set of correlation models used to generate Word vectors. After training is completed, the Word2Vec model can be used to map each Word to a vector, which can be used to represent the Word-to-Word relationship, and the vector is a hidden layer in the neural network.
It should be noted that the multi-modal fusion model 20 described above is only an example, and in a specific application, the multi-modal fusion model 20 may also exist in the form of other network models, which is not limited herein.
The method according to the embodiments of the present application is described in detail below. Fig. 3a is a schematic flowchart of an image reordering method according to an embodiment of the present application, which may be implemented in a computer device, and the method may include, but is not limited to, the following steps:
and S301, acquiring an image to be inquired.
In the embodiment of the application, a user can upload an image to be queried to computer equipment, and the computer equipment can acquire the image to be queried. It should be noted that the image to be queried may be a single image, or may be a video frame extracted from a segment of video, and the like.
S302, extracting image characteristics of an image to be inquired, and retrieving in an image database according to the image characteristics to obtain an initial retrieval result; wherein, the initial search result comprises N images; arranging the N images from high to low according to the feature similarity; each of the N images each includes a first image feature for characterizing a color, texture, shape, and spatial relationship of the image and a first text feature for characterizing text information (e.g., text) of the image; n is an integer greater than 0.
In this embodiment, an image database may be pre-established in the computer device, and a large number of sample images and structural information (e.g., a first image feature and a first text feature) corresponding to each sample image are stored in the image database. For example, 1000 images composed of ten major categories of subjects can be selected as sample images from the COREL of the benchmark photo gallery. By means of the sample images and the structural information of each sample image, a large number of sample images can be combined to form an organized, structured image database.
In the embodiment of the present application, when the image database is constructed, the image feature of the image may be extracted through the image feature extraction network 210 to obtain a first image feature; the text feature in the image may be extracted by the text feature extraction network 220 to obtain a first text feature.
After the computer device obtains the image to be queried input by the user, as shown in fig. 3b, the image features of the image to be queried may be extracted, and the similarity between the image features and the first image features corresponding to each image in the image database may be respectively calculated, so that an initial retrieval result may be obtained according to the feature similarity. The initial search result includes N images, and the N images are arranged from high to low in feature similarity. Here, N is an integer greater than 0. Then, after obtaining the initial search result, as shown in fig. 3c, the initial search result may be further processed to obtain a final search result. The first image feature is a feature in an image feature space; the first text feature is a feature in a text feature space.
Step S303, mapping the first image characteristic and the first text characteristic corresponding to each image in the initial retrieval result to the same target characteristic space to obtain a second image characteristic and a second text characteristic.
In this embodiment of the present application, the encoder 230 may map the first image feature and the first text feature corresponding to each image in the initial retrieval result into the same target feature space, so as to obtain a second image feature and a second text feature.
As mentioned above, for the image S, which includes the first image feature i and the first text feature c (that is, for an image, the first image feature i and the first text feature c are the representation of the image content), the cosine distance is used to calculate the similarity S (i, c) between the first image feature i and the first text feature c, which also means that for the image S, besides the first image feature i and the first text feature c, there are non-corresponding image features and non-corresponding text features in the image database.
To ensure similarity between the first image feature and the first text feature, a loss function is used in the encoder 230 for constraint, which can be expressed as:
Figure BDA0003625123900000091
where s (i, c) represents the cosine similarity between the first image feature i and the corresponding first text feature c, s (i, c) ) Representing the cosine similarity between a first image feature i and a non-corresponding first text feature c, s (i) And c) represents the cosine similarity between the non-corresponding first image feature i and the first text feature c.
In the loss function, a triple loss method is adopted to constrain the consistency between the first image feature i and the first text feature c by increasing the similarity between the matched first image feature i and the corresponding first text feature c, reducing the similarity between the unmatched first image feature i and the non-corresponding first text feature c, and reducing the similarity between the unmatched non-corresponding first image feature i and the first text feature c.
Due to the constraints described above, the second image feature may be kept in a close-proximity relationship with other features between different modalities, and the second text feature may also be kept in a close-proximity relationship with other features between different modalities.
Step S304, remapping the second image characteristic to an image characteristic space to obtain a fusion image characteristic corresponding to each image; remapping the second text characteristic to a text characteristic space to obtain a fusion text characteristic corresponding to each image; the fusion image features keep a close neighbor relation with other image features in the same modality; the fused text features maintain a close-neighbor relationship with other text features within the same modality.
In the embodiment of the present application, in order to enable the output fused image features and fused text features to fuse multi-modal information, a Mean-Square Error (MSE) loss function is used for constraint in the image feature decoder 240 and the text feature decoder 250, and specifically, the loss function may be represented as:
S =βS i +(1-β)S c ,β∈[0,1]
L=mse_loss(S ,output)
wherein β represents a weight coefficient, S i Representing image features in the original space, S c Representing text features in the original space; in the image feature decoder 240, output is the output of the image feature decoder 240; in the text feature decoder 250, output is the output of the text feature decoder 250.
In the embodiment of the present application, the second image feature is remapped to the image feature space by the image feature decoder 240, so as to obtain a fused image feature; the fusion image feature comprises a third image feature and a third text feature, and the proportion of the third image feature in the fusion image feature is higher than that of the third text feature, so that the fusion image feature and other image features in the same modality can have a neighbor relation. Mapping the second text feature to the original feature space through a text feature decoder 250 to obtain a fusion text feature; and the fusion text feature comprises a fourth image feature and a fourth text feature, wherein the proportion of the fourth text feature in the fusion text feature is higher than that of the fourth image feature, so that the fusion text feature and other text features in the same modality can have a neighbor relation.
In the present application, one type of feature is used to characterize one modality. For example, the fused image features of an image represent one modality of the image. As another example, the fused text features of an image represent a modality of the image.
And S305, reordering the initial retrieval results based on the fused image features and/or the fused text features corresponding to each image to obtain the final retrieval results.
In some embodiments, the reordering of the initial search results based on the fused image features corresponding to each image, and the implementation of the final search result may include: firstly, sequentially acquiring the feature similarity of a fusion image between an image to be inquired and each image in an initial retrieval result; and then, reordering the initial retrieval results according to the determined feature similarity of the fusion images to obtain a final retrieval result. For example, the initial search result includes 5 images, which are respectively an image 1, an image 2, an image 3, an image 4, and an image 5, where a similarity between a fused image feature corresponding to the image to be queried and a fused image feature corresponding to the image 1 is 0.8, a similarity between a fused image feature corresponding to the image to be queried and a fused image feature corresponding to the image 2 is 0.5, a similarity between a fused image feature corresponding to the image to be queried and a fused image feature corresponding to the image 3 is 0.9, a similarity between a fused image feature corresponding to the image to be queried and a fused image feature corresponding to the image 4 is 0.85, and a similarity between a fused image feature corresponding to the image to be queried and a fused image feature corresponding to the image 5 is 0.7, in which case, the initial search result is reordered according to the determined fused image feature similarity, and the reordered final search result may be: image 3, image 4, image 1, fig. 5 and image 2.
In some embodiments, the reordering of the initial search results based on the fused text features corresponding to each image, and the implementation of the final search result may include: firstly, sequentially acquiring the feature similarity of a fusion text between an image to be inquired and each image in an initial retrieval result; and then, reordering the initial search results according to the determined feature similarity of the fusion text to obtain a final search result. For example, the initial search result includes 5 images, which are respectively image 1, image 2, image 3, image 4 and image 5, where a similarity between the fused text feature corresponding to the image to be queried and the fused text feature corresponding to image 1 is 0.8, a similarity between the fused text feature corresponding to the image to be queried and the fused text feature corresponding to image 2 is 0.85, a similarity between the fused text feature corresponding to the image to be queried and the fused text feature corresponding to image 3 is 0.9, a similarity between the fused text feature corresponding to the image to be queried and the fused text feature corresponding to image 4 is 0.75, and a similarity between the fused text feature corresponding to the image to be queried and the fused text feature corresponding to image 5 is 0.7, in this case, the initial search result is reordered according to the determined similarity between the fused text features, and the reordered final search result may be: image 3, image 2, image 1, fig. 4, and image 5.
In some embodiments, the reordering based on the fused image feature and the fused text feature corresponding to each image, and the implementation process of obtaining the final retrieval result may include: firstly, in N +1 images, determining the distance between every two images according to the fusion image characteristics and the fusion text characteristics contained in every two images, wherein the N +1 images comprise the image to be inquired and the N images in the initial retrieval result, for example, aiming at the ith image and the jth image in the N +1 images; the ith image comprises a fusion image feature X and a fusion text feature Y; the jth image comprises a fused image feature P and a fused text feature Q; when calculating the distance between the ith image and the jth image, firstly, determining a first distance according to the fusion image characteristic X and the fusion image characteristic P; determining a second distance according to the fusion text characteristic Y and the fusion text characteristic Q; determining a third distance according to the fusion image characteristic X and the fusion text characteristic Q; determining a fourth distance according to the fusion text characteristic Y and the fusion image characteristic P; then, the distance between the ith image and the jth image is determined according to the first distance, the second distance, the third distance and the fourth distance.
Specifically, the distance between the ith image and the jth image may be calculated according to a first formula:
D(i,j)=w*d1+w*d2+(1-w)*d3+(1-w)*d4
wherein w is a weight coefficient, and d1 represents a distance between the fused image feature X of the ith image and the fused image feature P of the jth image, which is used for characterizing the distance between the ith image and the jth image in the same modality (the distance may reflect the similarity between the ith image and the jth image in the same modality); d2 represents the distance between the fusion text feature Y of the ith image and the fusion text feature Q of the jth image, and is used for representing the distance between the ith image and the jth image in the same modality (the distance can reflect the similarity between the ith image and the jth image in the same modality); d3 represents the distance between the fused image feature X of the ith image and the fused text feature Q of the jth image, and is used for representing the distance between the ith image and the jth image in different modalities (the distance can reflect the similarity between the ith image and the jth image in different modalities); d4 represents the distance between the fused text feature Y of the ith image and the fused image feature P of the jth image, and is used for representing the distance between the ith image and the jth image in different modalities (the distance can reflect the similarity between the ith image and the jth image in different modalities).
Then, the initial search results are reordered based on the determined distances.
By the method, the distance between the image to be queried and each image in the initial retrieval result can be obtained, for example, the initial retrieval result includes 5 images, namely, image 1, image 2, image 3, image 4 and image 5, where the distance between the image to be queried and image 1 is 0.7, the distance between the image to be queried and image 2 is 0.8, the distance between the image to be queried and image 3 is 0.85, the distance between the image to be queried and image 4 is 0.6, and the distance between the image to be queried and image 5 is 0.4. Then, in this case, the images to be queried may be reordered according to the distance between each image in the initial search results, and the final search results obtained by reordering may be: image 3, image 2, image 1, fig. 4, and image 5.
It is understood that, by this method, in addition to the distance between the image to be queried and each image in the initial search result, the distance between any two images in the initial search result can be obtained, for example, the initial search result includes image 1 and image 2. By the method, for the image to be inquired, the distance between the image to be inquired and the image 1 and the distance between the image to be inquired and the image 2 can be obtained; for the image 1, the distance between the image to be queried and the image 1 and the distance between the image 1 and the image 2 can be obtained; for the image 2, the distance between the image to be queried and the image 2, and the distance between the image 1 and the image 2 can be obtained. Then, in this case, based on the determined distance, a corresponding K mutual neighbor relationship for each of the N +1 images can be determined, where the K mutual neighbor relationship is used to characterize that the image a is a K neighbor of the image b, and the image b is also a K neighbor of the image a; and then, calculating the Jacard Jaccard distance between the image to be inquired and each image in the initial retrieval result according to the K mutual neighbor relation. For example, in N +1 images, the relationship between two images is encoded, and the encoding method can be as follows:
Figure BDA0003625123900000111
wherein d (p, g) i ) Watch (A)Display image p and image g i The distance between them.
After the image is encoded by the above encoding method, the Jaccard distance between the image to be queried and each image in the initial retrieval result is calculated according to the following formula, for example, the formula can be expressed as follows:
Figure BDA0003625123900000112
wherein d is J (p,g i ) Representing images p and g i Jaccard distance in between;
Figure BDA0003625123900000113
representing images p and g i The distance between them; />
Figure BDA0003625123900000114
Representing an image g j And image g i The distance between them.
Specifically, the Jacard distance is an index used to measure the difference between two sets, which is the complement of the Jacard similarity factor, defined as 1 minus the Jaccard similarity factor. The Jacard similarity coefficient (Jaccard similarity coefficient), also known as Jaccard Index (Jaccard Index), is an Index used to measure the similarity between two sets.
Then, after the Jaccard distance is obtained, the initial search results may be reordered according to the Jaccard distance. For example, the initial search result includes 5 images, which are respectively image 1, image 2, image 3, image 4 and image 5, where a Jaccard distance between the image to be queried and image 1 is 0.8, a Jaccard distance between the image to be queried and image 2 is 0.85, a Jaccard distance between the image to be queried and image 2 is 0.9, a Jaccard distance between the image to be queried and image 2 is 0.75, and a Jaccard distance between the image to be queried and image 2 is 0.7, in which case, the initial search result is reordered according to the determined Jaccard distance, and the final search result obtained by reordering may be: image 3, image 2, image 1, fig. 4, and image 5. As shown in fig. 3d, when the image features of the image to be queried are retrieved in the image database, the obtained initial retrieval result includes a plurality of images that do not meet the requirement (for example, images marked by red frames in the figure), and the initial retrieval result is reordered by the method described in the present application, so that the images that do not meet the requirement can be eliminated, and a final retrieval result that meets the user's expectation is obtained.
It should be noted that, in N +1 images, for N images in the initial search result, the fused image feature and the fused text feature corresponding to each image in the initial search result can be obtained through the multi-modal fusion model 20; for an image to be queried, the image feature extraction network 210 in the multi-modal fusion model 20 can extract image features corresponding to the image to be queried, so as to obtain corresponding fusion image features, that is: in the process, the fusion text features corresponding to the image to be queried are not acquired. In the prior art, the text features of the image to be queried can be extracted, and the extracted text features are used as the fused text features corresponding to the image to be queried, so that the problem of reducing the accuracy of the final retrieval result is easily caused by the implementation mode. Based on this, compared with the above prior art, the present application also provides a method for determining a fused text feature of an image to be queried, and specifically, a computer device may perform weighted average on first text features corresponding to first L images in an initial search result, to obtain a fused text feature corresponding to the image to be queried, where L is an integer greater than 0 and smaller than N. For example, L =3; for another example, L =2. By the implementation mode, the accuracy of the final retrieval result can be improved.
To better illustrate that the method proposed by the present application can improve the accuracy of the final search result, the test results of the existing model (which is reordered by using image feature single-mode data) and the multi-modal fusion model 20 proposed by the present application on the public data set NUS-WIDE are obtained separately, for example, the test results can be shown in table 1:
model (model) Test results
Existing model 77.45%
Multimodal fusion model 79.24%
TABLE 1
As can be seen from table 1, the test results of the multi-modal fusion model are superior to the existing model.
By implementing the embodiment of the application, because the obtained second image feature and the second text feature consider the neighbor relation between the modalities in the target feature space, the obtained fusion image feature and the fusion text feature can keep the neighbor relation in the original space, and therefore, when the initial retrieval result is reordered based on the fusion image feature and the fusion text feature corresponding to each image, the accuracy of the retrieval result can be improved. In the prior art, under the condition that a user is not satisfied with a final retrieval result, a computer device is often required to perform multiple times of retrieval to obtain a retrieval result with high accuracy, and the implementation mode needs to consume a large amount of resources, such as computing resources, of the device. Moreover, compared with the prior art, the method and the device have the advantages that due to the fact that the accuracy of the retrieval result is high, the computer equipment is not needed to conduct retrieval for multiple times, and the resource consumption of the computer equipment can be reduced.
The foregoing fig. 1a to fig. 3d describe in detail the image reordering method according to the embodiment of the present application, and the following describes the apparatus according to the embodiment of the present application with reference to the accompanying drawings.
Fig. 4 is a schematic structural diagram of an image reordering device 40 according to an embodiment of the present disclosure. The image reordering apparatus 40 shown in fig. 4 may include:
an image acquiring unit 400, configured to acquire an image to be queried;
a first retrieving unit 402, configured to extract an image feature of the image to be queried, and retrieve in an image database according to the image feature to obtain an initial retrieval result; wherein the initial retrieval result comprises N images; arranging the N images according to the feature similarity from high to low; each image in the N images respectively comprises first image features used for representing the color, texture, shape and spatial relationship of the image and first text features used for representing the text information of the image; n is an integer greater than 0; the first image feature is a feature in an image feature space; the first text feature is a feature in a text feature space;
a first feature mapping unit 404, configured to map a first image feature and a first text feature, which correspond to each image in the initial search result, to the same target feature space, so as to obtain a second image feature and a second text feature;
a second feature mapping unit 406, configured to remap the second image feature to the image feature space, so as to obtain a fused image feature corresponding to each image; remapping the second text characteristic to the text characteristic space to obtain a fusion text characteristic corresponding to each image; the fused image features have a neighbor relationship with other image features within the same modality; the fusion text features have a neighbor relation with other text features in the same modality; one feature type is used to characterize a modality;
the second retrieving unit 408 is configured to reorder the initial retrieval result based on the fused image feature and/or the fused text feature corresponding to each image, so as to obtain a final retrieval result.
In one possible implementation manner, the fused image feature includes a third image feature and a third text feature, and the proportion of the third image feature in the fused image feature is higher than that of the third text feature; the fused text feature comprises a fourth image feature and a fourth text feature, and the occupation ratio of the fourth text feature is higher than that of the fourth image feature in the fused text feature.
In one possible implementation, the second retrieving unit 408 comprises a distance calculating unit 4081 and a reordering unit 4082, wherein,
the distance calculation unit 4081 is configured to determine, in the N +1 images, a distance between each two images according to the fusion image feature and the fusion text feature included in each two images; the N +1 images comprise the image to be inquired and N images in the initial retrieval result;
the reordering unit 4082 is configured to determine, based on the determined distances, a K mutual neighbor relationship corresponding to each of the N +1 images, where the K mutual neighbor relationship is used to characterize that an image a is a K neighbor of an image b, and the image b is also a K neighbor of the image a; calculating the Jacard Jaccard distance between the image to be inquired and each image in the initial retrieval result according to the K mutual neighbor relation; and reordering the initial retrieval result according to the Jaccard distance.
In one possible implementation, for the ith image and the jth image in the N +1 images; wherein the ith image comprises a fused image feature X and a fused text feature Y; the jth image comprises a fused image feature P and a fused text feature Q; the distance calculation unit 4081 is specifically configured to:
determining a first distance according to the fused image feature X and the fused image feature P;
determining a second distance according to the fusion text characteristic Y and the fusion text characteristic Q;
determining a third distance according to the fused image feature X and the fused text feature Q;
determining a fourth distance according to the fusion text characteristic Y and the fusion image characteristic P;
determining a distance between the ith image and the jth image according to the first distance, the second distance, the third distance, and the fourth distance.
In one possible implementation, the first distance and the second distance are used to characterize the distance of the ith image and the jth image within the same modality; the third distance and the fourth distance are used for representing the distance between different modalities of the ith image and the jth image.
In one possible implementation, the second retrieving unit 408 includes a feature similarity calculating unit and a reordering unit, wherein,
the feature similarity calculation unit is used for sequentially obtaining the similarity between the fusion image features corresponding to the image to be inquired and the fusion image features corresponding to each image in the initial retrieval result;
and the reordering unit is used for reordering the initial retrieval results according to the determined similarity.
In one possible implementation, the second retrieving unit 408 includes a feature similarity calculating unit and a reordering unit, wherein,
the feature similarity calculation unit is used for sequentially obtaining the similarity between the fusion text features corresponding to the image to be inquired and the fusion text features corresponding to each image in the initial retrieval result;
and the reordering unit is used for reordering the initial retrieval results according to the determined similarity.
In one possible implementation manner, the apparatus may further include:
the feature extraction unit 4010 is configured to perform weighted average on first text features corresponding to the first L images in the initial search result, so as to obtain fused text features corresponding to the image to be queried; and L is an integer which is more than 0 and less than N.
In the embodiment of the present application, specific implementations of each unit may refer to related descriptions in the above embodiments, and are not described herein again.
By implementing the embodiment of the application, because the obtained second image feature and the second text feature consider the neighbor relation between the modalities in the target feature space, the obtained fusion image feature and the fusion text feature can keep the neighbor relation in the original space, and therefore, when the initial retrieval result is reordered based on the fusion image feature and the fusion text feature corresponding to each image, the accuracy of the retrieval result can be improved. In the prior art, under the condition that a user is not satisfied with a final retrieval result, a computer device is often required to perform multiple times of retrieval to obtain a retrieval result with high accuracy, and the implementation mode needs to consume a large amount of resources, such as computing resources, of the device. Moreover, compared with the prior art, the method and the device have the advantages that due to the fact that the accuracy of the retrieval result is high, the computer equipment is not needed to conduct retrieval for multiple times, and the resource consumption of the computer equipment can be reduced.
As shown in fig. 5, an image reordering device 50 provided in an embodiment of the present application may include a processor 501, a memory 502, a communication bus 503, and a communication interface 504, wherein the processor 501 is connected to the memory 502 and the communication interface 503 through the communication bus.
The processor 501 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), a neural Network Processor (NPU), or one or more Integrated circuits, and is configured to execute related programs to execute the image reordering method described in the embodiments of the present invention.
The processor 501 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the image reordering method of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 501. The processor 501 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 501, and the processor 501 reads the information in the memory 502 and executes the image reordering method of the embodiment of the present application in combination with the hardware thereof.
The Memory 502 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM). The memory 502 may store programs and data such as a program of the image reordering method in the embodiment of the present application, and the like. When the program stored in the memory 501 is executed by the processor 502, the processor 501 and the communication interface 504 are used for executing the steps of the image reordering method of the embodiment of the present application.
For example, a program for implementing the image reordering method in the embodiment of the present application, and the like in the embodiment of the present application.
The communication interface 504 enables communication between the image reordering device 500 and other devices or communication networks using transceiver means such as, but not limited to, a transceiver.
Optionally, the image reordering device may further include an artificial intelligence processor 505, and the artificial intelligence processor 505 may be any processor suitable for large-scale exclusive or operation Processing, such as a neural Network Processor (NPU), a Tensor Processor (TPU), or a Graphics Processing Unit (GPU). The artificial intelligence processor 505 may be mounted as a coprocessor to a Host CPU (Host CPU) for which tasks are assigned. The artificial intelligence processor 505 may implement one or more of the operations involved in the image reordering method described above. For example, taking an NPU as an example, the core portion of the NPU is an arithmetic circuit, and the controller controls the arithmetic circuit to extract matrix data in the memory 502 and perform a multiply-add operation.
The processor 501 is used for calling data and program codes in the memory and executing:
acquiring an image to be inquired;
extracting image features of the image to be inquired, and retrieving in an image database according to the image features to obtain an initial retrieval result; wherein the initial retrieval result comprises N images; arranging the N images according to the feature similarity from high to low; each image in the N images respectively comprises first image features used for representing the color, texture, shape and spatial relationship of the image and first text features used for representing the text information of the image; n is an integer greater than 0; the first image feature is a feature in an image feature space; the first text feature is a feature in a text feature space;
mapping a first image feature and a first text feature corresponding to each image in the initial retrieval result to the same target feature space to obtain a second image feature and a second text feature;
remapping the second image characteristics to the image characteristic space to obtain fused image characteristics corresponding to each image; remapping the second text characteristic to the text characteristic space to obtain a fusion text characteristic corresponding to each image; the fused image features have a neighbor relation with other image features within the same modality; the fusion text features have a neighbor relation with other text features in the same modality; one feature type is used to characterize a modality;
and reordering the initial retrieval result based on the fusion image characteristic and/or the fusion text characteristic corresponding to each image to obtain a final retrieval result.
Wherein the fused image feature comprises a third image feature and a third text feature, and the proportion of the third image feature is higher than that of the third text feature in the fused image feature; the fused text feature comprises a fourth image feature and a fourth text feature, and the occupation ratio of the fourth text feature is higher than that of the fourth image feature in the fused text feature.
The reordering of the initial search result by the fused image feature and the fused text feature corresponding to each image by the processor 501 may include:
in the N +1 images, determining the distance between every two images according to the fusion image characteristics and the fusion text characteristics contained in every two images; the N +1 images comprise the image to be inquired and N images in the initial retrieval result;
determining a K mutual neighbor relation corresponding to each image in the N +1 images based on the determined distance, wherein the K mutual neighbor relation is used for representing that the image a is a K neighbor of the image b, and the image b is also a K neighbor of the image a;
calculating the Jacard Jaccard distance between the image to be inquired and each image in the initial retrieval result according to the K mutual neighbor relation;
and reordering the initial retrieval result according to the Jaccard distance.
Wherein, aiming at the ith image and the jth image in the N +1 images; wherein the ith image comprises a fused image feature X and a fused text feature Y; the jth image comprises a fused image feature P and a fused text feature Q; the processor 501 determines a distance between two images according to the feature of the fused image and the feature of the fused text included in each of the two images, which may include:
determining a first distance according to the fused image feature X and the fused image feature P;
determining a second distance according to the fusion text characteristic Y and the fusion text characteristic Q;
determining a third distance according to the fused image feature X and the fused text feature Q;
determining a fourth distance according to the fusion text characteristic Y and the fusion image characteristic P;
determining a distance between the ith image and the jth image according to the first distance, the second distance, the third distance, and the fourth distance.
Wherein the first distance and the second distance are used to characterize the distance of the ith image and the jth image within the same modality; the third distance and the fourth distance are used for representing the distance between different modalities of the ith image and the jth image.
The reordering of the initial search results by the processor 501 based on the fused image features corresponding to each image may include:
sequentially acquiring the similarity between the fusion image characteristics corresponding to the image to be inquired and the fusion image characteristics corresponding to each image in the initial retrieval result;
and reordering the initial retrieval results according to the determined similarity.
The reordering, by the processor 501, the initial search result based on the fused text feature corresponding to each image may include:
sequentially acquiring the similarity between the fusion text features corresponding to the image to be inquired and the fusion text features corresponding to each image in the initial retrieval result;
and reordering the initial retrieval results according to the determined similarity.
Before determining the distance between each two images in the N +1 images according to the fused image feature and the fused text feature respectively included in each two images, the processor 501 may be further configured to:
carrying out weighted average on first text features corresponding to the previous L images in the initial retrieval result to obtain fusion text features corresponding to the image to be inquired; and L is an integer which is more than 0 and less than N.
It should be understood that, the implementation of each device may also correspondingly refer to the corresponding description in the foregoing image reordering method embodiment, and the description of the embodiment of the present application is omitted.
The present embodiments also provide a computer storage medium having instructions stored therein, which when executed on a computer or a processor, cause the computer or the processor to perform one or more steps of the method according to any one of the above embodiments. Based on the understanding that each constituent module of the above-mentioned apparatus, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in the computer readable storage medium, and based on this understanding, the technical solution of the present application, or a part or all of the technical solution that contributes to the prior art, may be embodied in the form of a software product, and the computer product is stored in the computer readable storage medium.
The computer readable storage medium may be an internal storage unit of the device according to the foregoing embodiment, for example, a hard disk or a memory. The computer readable storage medium may be an external storage device of the above-described apparatus, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the computer-readable storage medium may include both an internal storage unit and an external storage device of the device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the apparatus. The above-described computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the above embodiments of the methods when the computer program is executed. And the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.
The modules in the device can be merged, divided and deleted according to actual needs.
It is to be understood that one of ordinary skill in the art would recognize that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed in the various embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Those of skill would appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps disclosed in the various embodiments disclosed herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or any communication medium including a medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (18)

1. A method of reordering images, comprising:
acquiring an image to be inquired;
extracting image features of the image to be inquired, and retrieving in an image database according to the image features to obtain an initial retrieval result; wherein the initial retrieval result comprises N images; each of the N images each includes a first image feature and a first text feature; n is an integer greater than 0; the first image feature is a feature in an image feature space; the first text feature is a feature in a text feature space;
mapping a first image feature and a first text feature corresponding to each image in the initial retrieval result to the same target feature space to obtain a second image feature and a second text feature;
remapping the second image characteristic to the image characteristic space to obtain a fused image characteristic corresponding to each image; remapping the second text characteristic to the text characteristic space to obtain a fusion text characteristic corresponding to each image; the fused image feature comprises a third image feature and a third text feature; the fused text feature comprises a fourth image feature and a fourth text feature;
and sequencing the initial retrieval results based on the fusion image features and/or the fusion text features corresponding to the images to obtain final retrieval results.
2. The method of claim 1, wherein in the fused image feature, a proportion of the third image feature is higher than a proportion of the third text feature; in the fused text feature, the occupation ratio of the fourth text feature is higher than that of the fourth image feature.
3. The method of claim 1 or 2, wherein the reordering of the initial search results based on the fused image features and fused text features corresponding to each of the images comprises:
in the N +1 images, determining the distance between every two images according to the fusion image characteristics and the fusion text characteristics contained in every two images; the N +1 images comprise the image to be inquired and N images in the initial retrieval result;
determining a K mutual neighbor relation corresponding to each image in the N +1 images based on the determined distance, wherein the K mutual neighbor relation is used for representing that the image a is a K neighbor of the image b, and the image b is also a K neighbor of the image a;
calculating the Jacard Jaccard distance between the image to be inquired and each image in the initial retrieval result according to the K mutual neighbor relation;
and sorting the initial retrieval results according to the Jaccard distance.
4. The method of claim 3, wherein for the ith and jth images of the N +1 images; wherein the ith image comprises a fused image feature X and a fused text feature Y; the jth image comprises a fused image feature P and a fused text feature Q; the determining the distance between every two images according to the fusion image characteristics and the fusion text characteristics contained in every two images comprises the following steps:
determining a first distance according to the fused image feature X and the fused image feature P;
determining a second distance according to the fusion text feature Y and the fusion text feature Q;
determining a third distance according to the fused image feature X and the fused text feature Q;
determining a fourth distance according to the fusion text characteristic Y and the fusion image characteristic P;
determining a distance between the ith image and the jth image according to the first distance, the second distance, the third distance, and the fourth distance.
5. The method of claim 4, wherein the first distance and the second distance are used to characterize a distance of the ith image and the jth image within a same modality; the third distance and the fourth distance are used for representing the distance between different modalities of the ith image and the jth image.
6. The method of claim 1, wherein said reordering the initial search results based on the fused image features corresponding to each of the images comprises:
sequentially acquiring the similarity between the fusion image characteristics corresponding to the image to be inquired and the fusion image characteristics corresponding to each image in the initial retrieval result;
and sequencing the initial retrieval results according to the determined similarity.
7. The method of claim 1, wherein said reordering the initial search results based on the fused text feature corresponding to each of the images comprises:
sequentially acquiring the similarity between the fusion text features corresponding to the image to be inquired and the fusion text features corresponding to each image in the initial retrieval result;
and sequencing the initial retrieval results according to the determined similarity.
8. The method according to claim 3, wherein before determining the distance between two images in the N +1 images according to the fused image feature and the fused text feature contained in each of the two images, the method further comprises:
carrying out weighted average on first text features corresponding to the previous L images in the initial retrieval result to obtain fusion text features corresponding to the image to be inquired; and L is an integer which is more than 0 and less than N.
9. An image reordering apparatus, comprising:
the image acquiring unit is used for acquiring an image to be inquired;
the first retrieval unit is used for extracting the image characteristics of the image to be queried and retrieving in an image database according to the image characteristics to obtain an initial retrieval result; wherein the initial retrieval result comprises N images; each of the N images each includes a first image feature and a first text feature; n is an integer greater than 0; the first image feature is a feature in an image feature space; the first text feature is a feature in a text feature space;
the first feature mapping unit is used for mapping the first image feature and the first text feature corresponding to each image in the initial retrieval result to the same target feature space to obtain a second image feature and a second text feature;
the second feature mapping unit is used for remapping the second image features to the image feature space to obtain fused image features corresponding to each image; remapping the second text characteristic to the text characteristic space to obtain a fusion text characteristic corresponding to each image; the fused image feature comprises a third image feature and a third text feature; the fused text feature comprises a fourth image feature and a fourth text feature;
and the second retrieval unit is used for sequencing the initial retrieval results based on the fused image characteristics and/or the fused text characteristics corresponding to each image to obtain final retrieval results.
10. The apparatus of claim 9, wherein in the fused image feature, a duty ratio of the third image feature is higher than a duty ratio of the third text feature; in the fused text feature, the occupation ratio of the fourth text feature is higher than that of the fourth image feature.
11. The apparatus according to claim 9 or 10, wherein the second retrieving unit comprises a distance calculating unit and a reordering unit, wherein,
the distance calculation unit is used for determining the distance between every two images in the N +1 images according to the fused image characteristics and the fused text characteristics contained in every two images; the N +1 images comprise the image to be inquired and N images in the initial retrieval result;
the reordering unit is used for determining a K mutual neighbor relation corresponding to each image in the N +1 images based on the determined distance, wherein the K mutual neighbor relation is used for representing that the image a is a K neighbor of the image b, and the image b is also a K neighbor of the image a; calculating the Jacard Jaccard distance between the image to be inquired and each image in the initial retrieval result according to the K mutual neighbor relation; and sorting the initial retrieval results according to the Jaccard distance.
12. The apparatus of claim 11, wherein for an ith image and a jth image of the N +1 images; wherein the ith image comprises a fused image feature X and a fused text feature Y; the jth image comprises a fused image feature P and a fused text feature Q; the distance calculation unit is specifically configured to:
determining a first distance according to the fused image feature X and the fused image feature P;
determining a second distance according to the fusion text feature Y and the fusion text feature Q;
determining a third distance according to the fused image feature X and the fused text feature Q;
determining a fourth distance according to the fusion text characteristic Y and the fusion image characteristic P;
determining a distance between the ith image and the jth image according to the first distance, the second distance, the third distance, and the fourth distance.
13. The apparatus of claim 12, wherein the first distance and the second distance are used to characterize a distance that the ith image and the jth image are within a same modality; the third distance and the fourth distance are used for representing the distance between different modalities of the ith image and the jth image.
14. The apparatus of claim 9, wherein the second retrieving unit comprises a feature similarity calculation unit and a reordering unit, wherein,
the feature similarity calculation unit is used for sequentially obtaining the similarity between the fusion image features corresponding to the image to be inquired and the fusion image features corresponding to each image in the initial retrieval result;
and the reordering unit is used for ordering the initial retrieval result according to the determined similarity.
15. The apparatus of claim 9, wherein the second retrieval unit comprises a feature similarity calculation unit and a reordering unit, wherein,
the feature similarity calculation unit is used for sequentially obtaining the similarity between the fusion text features corresponding to the image to be inquired and the fusion text features corresponding to each image in the initial retrieval result;
and the reordering unit is used for ordering the initial retrieval result according to the determined similarity.
16. The apparatus of claim 9 or 10, wherein the apparatus further comprises:
the feature extraction unit is used for performing weighted average on first text features corresponding to the previous L images in the initial retrieval result to obtain fusion text features corresponding to the image to be queried; and L is an integer which is more than 0 and less than N.
17. An image reordering device comprising a processor and a memory, said processor and memory being interconnected, wherein said memory is adapted to store a computer program comprising program instructions, said processor being configured to invoke said program instructions to perform the method according to any of claims 1-8.
18. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-8.
CN202210475225.7A 2020-09-23 2020-09-23 Image reordering method, related device and computer readable storage medium Active CN114969417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210475225.7A CN114969417B (en) 2020-09-23 2020-09-23 Image reordering method, related device and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011012034.4A CN112256899B (en) 2020-09-23 2020-09-23 Image reordering method, related device and computer readable storage medium
CN202210475225.7A CN114969417B (en) 2020-09-23 2020-09-23 Image reordering method, related device and computer readable storage medium

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202011012034.4A Division CN112256899B (en) 2020-09-23 2020-09-23 Image reordering method, related device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN114969417A CN114969417A (en) 2022-08-30
CN114969417B true CN114969417B (en) 2023-04-11

Family

ID=74231964

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210475225.7A Active CN114969417B (en) 2020-09-23 2020-09-23 Image reordering method, related device and computer readable storage medium
CN202011012034.4A Active CN112256899B (en) 2020-09-23 2020-09-23 Image reordering method, related device and computer readable storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202011012034.4A Active CN112256899B (en) 2020-09-23 2020-09-23 Image reordering method, related device and computer readable storage medium

Country Status (1)

Country Link
CN (2) CN114969417B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784086A (en) * 2021-01-28 2021-05-11 北京有竹居网络技术有限公司 Picture screening method and device, storage medium and electronic equipment
CN113656668B (en) * 2021-08-19 2022-10-11 北京百度网讯科技有限公司 Retrieval method, management method, device, equipment and medium of multi-modal information base
CN113688263B (en) * 2021-10-26 2022-01-18 北京欧应信息技术有限公司 Method, computing device, and storage medium for searching for image
CN118364127B (en) * 2024-06-20 2024-09-06 中南大学 Home textile image retrieval and reordering method and device based on feature fusion

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019148898A1 (en) * 2018-02-01 2019-08-08 北京大学深圳研究生院 Adversarial cross-media retrieving method based on restricted text space

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090292685A1 (en) * 2008-05-22 2009-11-26 Microsoft Corporation Video search re-ranking via multi-graph propagation
JP4999886B2 (en) * 2009-06-09 2012-08-15 ヤフー株式会社 Image search device
CN104572651B (en) * 2013-10-11 2017-09-29 华为技术有限公司 Picture sort method and device
US9836671B2 (en) * 2015-08-28 2017-12-05 Microsoft Technology Licensing, Llc Discovery of semantic similarities between images and text
US10026020B2 (en) * 2016-01-15 2018-07-17 Adobe Systems Incorporated Embedding space for images with multiple text labels
US10642887B2 (en) * 2016-12-27 2020-05-05 Adobe Inc. Multi-modal image ranking using neural networks
CN107357884A (en) * 2017-07-10 2017-11-17 中国人民解放军国防科学技术大学 A kind of different distance measure across media based on two-way study sequence
CN107657008B (en) * 2017-09-25 2020-11-03 中国科学院计算技术研究所 Cross-media training and retrieval method based on deep discrimination ranking learning
CN108446404B (en) * 2018-03-30 2021-01-05 中国科学院自动化研究所 Search method and system for unconstrained visual question-answer pointing problem
CN109255047A (en) * 2018-07-18 2019-01-22 西安电子科技大学 Based on the complementary semantic mutual search method of image-text being aligned and symmetrically retrieve
CN109783655B (en) * 2018-12-07 2022-12-30 西安电子科技大学 Cross-modal retrieval method and device, computer equipment and storage medium
CN109858555B (en) * 2019-02-12 2022-05-17 北京百度网讯科技有限公司 Image-based data processing method, device, equipment and readable storage medium
CN110442741B (en) * 2019-07-22 2022-10-18 成都澳海川科技有限公司 Tensor fusion and reordering-based cross-modal image-text mutual search method
CN111324765A (en) * 2020-02-07 2020-06-23 复旦大学 Fine-grained sketch image retrieval method based on depth cascade cross-modal correlation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019148898A1 (en) * 2018-02-01 2019-08-08 北京大学深圳研究生院 Adversarial cross-media retrieving method based on restricted text space

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Xueming Qian.Image Re-Ranking Based on Topic Diversity.《IEEE Transactions on Image Processing 》.2017,第26卷(第8期),3734 - 3747. *
张耕兴.基于多模态特征融合的图像重排序研究.《中国优秀硕士学位论文全文数据库信息科技辑》.2016,(2016年第2期),全文. *

Also Published As

Publication number Publication date
CN112256899B (en) 2022-05-10
CN112256899A (en) 2021-01-22
CN114969417A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN114969417B (en) Image reordering method, related device and computer readable storage medium
JP6843086B2 (en) Image processing systems, methods for performing multi-label semantic edge detection in images, and non-temporary computer-readable storage media
CN109522942B (en) Image classification method and device, terminal equipment and storage medium
EP3627397B1 (en) Processing method and apparatus
US20200074205A1 (en) Methods and apparatuses for vehicle appearance feature recognition, methods and apparatuses for vehicle retrieval, storage medium, and electronic devices
US20220058429A1 (en) Method for fine-grained sketch-based scene image retrieval
CN110838125B (en) Target detection method, device, equipment and storage medium for medical image
CN109101946B (en) Image feature extraction method, terminal device and storage medium
JP7559063B2 (en) FACE PERSHING METHOD AND RELATED DEVICE
CN114332680A (en) Image processing method, video searching method, image processing device, video searching device, computer equipment and storage medium
CN113762309B (en) Object matching method, device and equipment
KR102576157B1 (en) Method and apparatus for high speed object detection using artificial neural network
CN113657087B (en) Information matching method and device
WO2022100607A1 (en) Method for determining neural network structure and apparatus thereof
CN114612681A (en) GCN-based multi-label image classification method, model construction method and device
CN111126049B (en) Object relation prediction method, device, terminal equipment and readable storage medium
CN117058517A (en) Helmet detection method, device and medium based on YOLOv5 optimization model
CN113627421B (en) Image processing method, training method of model and related equipment
CN110853115A (en) Method and equipment for creating development process page
CN114821140A (en) Image clustering method based on Manhattan distance, terminal device and storage medium
CN111931841A (en) Deep learning-based tree processing method, terminal, chip and storage medium
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN112650869A (en) Image retrieval reordering method and device, electronic equipment and storage medium
CN113157963A (en) Image screening method, device electronic equipment and readable storage medium
CN114692715A (en) Sample labeling method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant