A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval
<p>The various pairwise relationships present in information retrieval datasets. (<b>a</b>) 1-1 Paired, (<b>b</b>) 1-1 Many Paired, (<b>c</b>) 1-1 Aligned Paired, (<b>d</b>) 1-Many Aligned Paired, and (<b>e</b>) Unpaired.</p> "> Figure 2
<p>Overview of an end-to-end deep hashing architecture. This figure illustrates a simplified recreation of the Deep Cross-Modal Hashing (DCMH) [<a href="#B9-jimaging-08-00328" class="html-bibr">9</a>] network architecture (CNN: Convolutional Neural Network, BOW: bag of words, FC: Fully Connected layers). Example elephant (1), bicycle (2) and spoon (3) images reprinted under Creative Commons attribution, (1) Title: Elephant Addo, Author: Mikefairbanks, <a href="https://commons.wikimedia.org/wiki/File:Elephant_Addo.jpg" target="_blank">Source</a>, CC BY 2.0 (2) Title: Dessert Spoon, Author: Donovan Govan, <a href="https://commons.wikimedia.org/wiki/File:Dessert_Spoon.jpg" target="_blank">Source</a>, CC BY-SA 3.0 (3) Title: Electric Bicycle, Author: Mikefairbanks, <a href="https://commons.wikimedia.org/wiki/File:Electric_Bicycle.jpg" target="_blank">Source</a>, CC BY-SA 3.0.</p> "> Figure 3
<p>Simplified workflow of adversarial-based CMH methods depicting approaches used by methods such as Deep Adversarial Discrete Hashing (DADH) [<a href="#B12-jimaging-08-00328" class="html-bibr">12</a>] and Adversary Guided Asymmetric Hashing (AGAH) [<a href="#B10-jimaging-08-00328" class="html-bibr">10</a>] (CNN: Convolutional Neural Network, BOW: bag of words, FC: Fully Connected layers).</p> "> Figure 4
<p>Unpaired Multi-Modal Learning (UMML) framework workflow. The diagram shows an example of 50% of images being unpaired where 50% text Bag of Words (BoW) binary vectors are emptied. Similarly, in the case of text being unpaired, the image feature matrices would be emptied (CNN: convolutional neural network).</p> "> Figure 5
<p>Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images, i.e., images with no corresponding text. The ‘Paired’ points show results when training with a fully paired training set. Subsequent points show results with increasing amounts of unpaired images in the training set in increments of 20%.</p> "> Figure 6
<p>Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired text, i.e., text with no corresponding images. The ‘Paired’ points show results when training with a fully paired training set. Subsequent points show results with increasing amounts of unpaired text in the training set in increments of 20%.</p> "> Figure 7
<p>Results (mAP) on MIR-Flickr25K and NUS-WIDE with unpaired images and text, i.e., images with no corresponding text and vice versa. The ‘Paired’ points show results when training with a fully paired training set. Subsequent points show results with increasing amounts of unpaired images and text in the training set, for example, ‘10%/10%’ refers to 10% of the training set being unpaired images and another 10% being unpaired text for a total of 20% of the dataset being unpaired samples.</p> "> Figure 8
<p>In (<b>a</b>), 20% of the training set was discarded. In (<b>b</b>), 20% of the training set was unpaired. In this example, for both (<b>a</b>,<b>b</b>), the model will be trained on <math display="inline"><semantics> <mrow> <mn>8000</mn> </mrow> </semantics></math> paired samples. However, (<b>b</b>) will also train with its additional <math display="inline"><semantics> <mrow> <mn>2000</mn> </mrow> </semantics></math> unpaired samples. This way, the effect of training with or without the additional unpaired samples can be investigated.</p> "> Figure 9
<p>Results (mAP) on MIR-Flickr25K and NUS-WIDE with sample discarding, i.e., training set being reduced. The ‘Full’ points show results when training with the full unaltered training set. Subsequent points show results with decreasing amounts of samples, where the given percentage denotes the percentage of samples in the training set which have been discarded. The ‘Random’ points hold the baseline random performance values.</p> "> Figure 10
<p>Percentage of performance change of DADH computed using formula (<a href="#FD4-jimaging-08-00328" class="html-disp-formula">4</a>) when training with unpaired samples compared to paired training across 24 classes of MIR-Flickr25K. Red bars show the five classes with the most performance change and green bars show the five classes with the least performance change. The remaining classes are marked as blue bars.</p> ">
Abstract
:1. Introduction
- A comprehensive overview of CMH methods, specifically in the context of utilising unpaired data. The current state of CMH is surveyed, the different pairwise relationship forms in which data can be represented are identified, and the current use or lack of unpaired data is discussed [6,13,14]. However, the literature does not provide an overview of CMH methods applied to unpaired data. The aspects which bind current CMH methods to paired data are discussed.
- A new framework for Unpaired Multi-Modal Learning (UMML) to enable training of otherwise pairwise-constrained CMH methods on unpaired data. Pairwise-constrained CMH methods cannot inherently include unpaired samples in their learning process. Using the proposed framework, the MIR-Flickr25K and NUS-WIDE datasets are adapted to enable training of pairwise-constrained CMH methods when datasets contain unpaired images, unpaired text, and both unpaired image and text within their training set.
- Experiments were carried out to (1) evaluate state-of-the-art CMH methods using the proposed UMML framework when using paired and unpaired data samples for training, and (2) provide an insight as to whether unpaired data samples can be utilised during the training process to reflect real-world use cases where paired data may not be available but a network needs to be trained for a CMR task.
2. Related Work
2.1. Multi-Modal Pairwise Relationship Types
2.2. Learning to Hash
2.3. Cross-Modal Hashing Categorisation
2.4. Unpaired Cross-Modal Hashing Methods
2.5. Architectural Reliance on Paired Samples of Existing CMH Methods
3. UMML: Proposed Unpaired Multi-Modal Learning (UMML) Framework
4. Experiment Methodology
4.1. Datasets
4.2. Methods
4.3. Evaluation Metrics
5. Experiment Results
5.1. Training with Unpaired Images
- (1)
- Dataset impacts the performance of models. Different datasets provide different behaviours when unpaired samples are introduced into the training set. With MIR-Flickr25K, DADH and AGAH see different patterns of performance decrease for the () and () tasks, while with NUS-WIDE, DADH and AGAH see similar patterns for the two tasks. JDSH, on the other hand, shows similar patterns for both tasks on both datasets.
- (2)
- Percentage of Unpairing may impact performance. For MIR-Flickr25K, the performance of methods DADH and AGAH for the () task is negatively affected as the percentage of unpaired images increases. For the () however, with the exception of 100% image unpairing, performance was unaffected when the percentage of unpaired images increased. Once all images in the training set are fully unpaired (i.e., 100% unpaired), the performance of both tasks across all methods is measured at an average of 0.564 mAP for MIR-Flickr25K and 0.268 mAP for NUS-WIDE. These results will later be compared to random performance evaluations in Section 5.4 to determine the extent to which the methods are learning from training with 100% unpaired images.
5.2. Training with Unpaired Text
5.3. Training with Unpaired Images and Text
5.4. Training with Sample Discarding
5.5. Comparison to Other Unpaired CMH Methods
5.6. Class-by-Class Performance Evaluations
6. Conclusions
- –
- Unpaired data can improve the training results of CMH methods. Furthermore, if data from both the image and text modalities are present in the training set, initially pairwise-constrained CMH methods can be trained on fully unpaired data.
- –
- The extent to which unpaired data are helpful to the training process is relative to the amount of paired samples. The more scarce the paired samples available, the more helpful it can be to use additional unpaired samples for training.
- –
- The performance of the models showcased when using unpaired samples for training is dependent on the modality of the unpaired samples, the dataset being used, the class of the unpaired data, and the architecture of the CMH algorithms. These factors influence whether unpaired samples will be helpful to the training process.
- –
- The proposed UMML framework adapts the dataset to enable pairwise-constrained CMH methods to train on unpaired samples. When using UMML to enable DADH, AGAH and JDSH to train with unpaired samples, it was observed that the methods perform well when training with unpaired samples. This suggests that further improvements may be observed if the architectures of these methods are adapted to train on unpaired data.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval. In Modern Information Retrieval; Association for Computing Machinery Press: New York, NY, USA, 1999; Volume 463. [Google Scholar]
- Lu, X.; Zhu, L.; Cheng, Z.; Song, X.; Zhang, H. Efficient Discrete Latent Semantic Hashing for Scalable Cross-Modal Retrieval. Signal Process. 2019, 154, 217–231. [Google Scholar] [CrossRef]
- Jin, L.; Li, K.; Li, Z.; Xiao, F.; Qi, G.J.; Tang, J. Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 1429–1440. [Google Scholar] [CrossRef] [PubMed]
- Kumar, S.; Udupa, R. Learning Hash Functions for Cross-view Similarity Search. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona Catalonia, Spain, 16–22 July 2011. [Google Scholar]
- Zhang, D.; Li, W.J. Large-Scale Supervised Multimodal Hashing With Semantic Correlation Maximization. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
- Wang, J.; Zhang, T.; Sebe, N.; Shen, H.T. A Survey on Learning to Hash. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 769–790. [Google Scholar] [CrossRef] [PubMed]
- Deng, C.; Yang, E.; Liu, T.; Tao, D. Two-Stream Deep Hashing with Class-Specific Centers for Supervised Image Search. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 2189–2201. [Google Scholar] [CrossRef]
- Peng, Y.; Huang, X.; Zhao, Y. An Overview of Cross-Media Retrieval: Concepts, methodologies, Benchmarks, and Challenges. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 2372–2385. [Google Scholar] [CrossRef] [Green Version]
- Jiang, Q.Y.; Li, W.J. Deep Cross-Modal Hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3232–3240. [Google Scholar]
- Gu, W.; Gu, X.; Gu, J.; Li, B.; Xiong, Z.; Wang, W. Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval. In Proceedings of the 2019 on International Conference on Multimedia Retrieval, Ottawa, ON, Canada, 10–13 June 2019; pp. 159–167. [Google Scholar]
- Liu, S.; Qian, S.; Guan, Y.; Zhan, J.; Ying, L. Joint-Modal Distribution-Based Similarity Hashing for Large-Scale Unsupervised Deep Cross-Modal Retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, China, 25–30 July 2020; pp. 1379–1388. [Google Scholar]
- Bai, C.; Zeng, C.; Ma, Q.; Zhang, J.; Chen, S. Deep Adversarial Discrete Hashing for Cross-Modal Retrieval. In Proceedings of the 2020 International Conference on Multimedia Retrieval, Dublin, Ireland, 8–11 June 2020; pp. 525–531. [Google Scholar]
- Zheng, L.; Yang, Y.; Tian, Q. SIFT Meets CNN: A Decade Survey of Instance Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1224–1244. [Google Scholar] [CrossRef] [Green Version]
- Wang, J.; Liu, W.; Kumar, S.; Chang, S.F. Learning to Hash for Indexing Big Data—A Survey. Proc. IEEE 2015, 104, 34–57. [Google Scholar] [CrossRef]
- Shen, H.T.; Liu, L.; Yang, Y.; Xu, X.; Huang, Z.; Shen, F.; Hong, R. Exploiting Subspace Relation in Semantic Labels for Cross-Modal Hashing. IEEE Trans. Knowl. Data Eng. 2020, 33, 3351–3365. [Google Scholar] [CrossRef]
- Ding, K.; Huo, C.; Fan, B.; Xiang, S.; Pan, C. In Defense of Locality-Sensitive Hashing. IEEE Trans. Neural Netw. Learn. Syst. 2016, 29, 87–103. [Google Scholar] [CrossRef]
- Hardoon, D.R.; Szedmak, S.; Shawe-Taylor, J. Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Comput. 2004, 16, 2639–2664. [Google Scholar] [CrossRef]
- Liu, X.; Hu, Z.; Ling, H.; Cheung, Y.M. MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 964–981. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cao, W.; Feng, W.; Lin, Q.; Cao, G.; He, Z. A Review of Hashing Methods for Multimodal Retrieval. IEEE Access 2020, 8, 15377–15391. [Google Scholar] [CrossRef]
- Pereira, J.C.; Coviello, E.; Doyle, G.; Rasiwasia, N.; Lanckriet, G.R.; Levy, R.; Vasconcelos, N. On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 521–535. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Young, P.; Lai, A.; Hodosh, M.; Hockenmaier, J. From Image Descriptions to Visual Denotations: New Similarity Metrics for Semantic Inference Over Event Descriptions. Trans. Assoc. Comput. Linguist. 2014, 2, 67–78. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Luo, X.; Wang, H.; Wu, D.; Chen, C.; Deng, M.; Huang, J.; Hua, X.S. A Survey on Deep Hashing Methods. Acm Trans. Knowl. Discov. Data 2022. [Google Scholar] [CrossRef]
- Strecha, C.; Bronstein, A.; Bronstein, M.; Fua, P. LDAHash: Improved Matching with Smaller Descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 66–78. [Google Scholar] [CrossRef] [Green Version]
- He, J.; Liu, W.; Chang, S.F. Scalable Similarity Search with Optimized Kernel Hashing. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 1129–1138. [Google Scholar]
- Gui, J.; Liu, T.; Sun, Z.; Tao, D.; Tan, T. Supervised Discrete Hashing with Relaxation. IEEE Trans. Neural Netw. Learn. Syst. 2016, 29, 608–617. [Google Scholar] [CrossRef] [Green Version]
- Gionis, A.; Indyk, P.; Motwani, R. Similarity Search in High Dimensions via Hashing. Very Large Data Bases 1999, 99, 518–529. [Google Scholar]
- Zhu, X.; Huang, Z.; Shen, H.T.; Zhao, X. Linear Cross-Modal Hashing for Efficient Multimedia Search. In Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain, 21–25 October 2013; pp. 143–152. [Google Scholar]
- Ding, G.; Guo, Y.; Zhou, J. Collective Matrix Factorization Hashing for Multimodal Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2075–2082. [Google Scholar]
- Lin, Z.; Ding, G.; Han, J.; Wang, J. Cross-View Retrieval via Probability-Based Semantics-Preserving Hashing. IEEE Trans. Cybern. 2016, 47, 4342–4355. [Google Scholar] [CrossRef] [Green Version]
- Liu, Q.; Liu, G.; Li, L.; Yuan, X.T.; Wang, M.; Liu, W. Reversed Spectral Hashing. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 2441–2449. [Google Scholar] [CrossRef]
- Liu, X.; Yu, G.; Domeniconi, C.; Wang, J.; Ren, Y.; Guo, M. Ranking-Based Deep Cross-Modal Hashing. Proc. Aaai Conf. Artif. Intell. 2019, 33, 4400–4407. [Google Scholar] [CrossRef]
- Wang, J.; Liu, W.; Sun, A.X.; Jiang, Y.G. Learning Hash Codes with Listwise Supervision. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 3032–3039. [Google Scholar]
- Jin, Z.; Hu, Y.; Lin, Y.; Zhang, D.; Lin, S.; Cai, D.; Li, X. Complementary Projection Hashing. In Proceedings of the IEEE International Conference on Computer Vision, Washington, DC, USA, 2–8 December 2013; pp. 257–264. [Google Scholar]
- Yang, E.; Deng, C.; Liu, W.; Liu, X.; Tao, D.; Gao, X. Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Li, C.; Deng, C.; Li, N.; Liu, W.; Gao, X.; Tao, D. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4242–4251. [Google Scholar]
- Mandal, D.; Chaudhury, K.N.; Biswas, S. Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4076–4084. [Google Scholar]
- Hu, Z.; Liu, X.; Wang, X.; Cheung, Y.m.; Wang, N.; Chen, Y. Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval. In Proceedings of the 2019 on International Conference on Multimedia Retrieval, Ottawa, ON, Canada, 10–13 June 2019; pp. 141–149. [Google Scholar]
- Wen, X.; Han, Z.; Yin, X.; Liu, Y.S. Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; pp. 478–483. [Google Scholar]
- Gao, J.; Zhang, W.; Zhong, F.; Chen, Z. UCMH: Unpaired Cross-Modal Hashing with Matrix Factorization. Elsevier Neurocomput. 2020, 418, 178–190. [Google Scholar] [CrossRef]
- Liu, W.; Wang, J.; Kumar, S.; Chang, S. Hashing with graphs. In Proceedings of the International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
- Cheng, M.; Jing, L.; Ng, M. Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval. ACM Trans. Inf. Syst. (TOIS) 2020, 38, 1–25. [Google Scholar] [CrossRef]
- Luo, K.; Zhang, C.; Li, H.; Jia, X.; Chen, C. Adaptive Marginalized Semantic Hashing for Unpaired Cross-Modal Retrieval. arXiv 2022, arXiv:2207.11880. [Google Scholar]
- Yu, G.; Liu, X.; Wang, J.; Domeniconi, C.; Zhang, X. Flexible Cross-Modal Hashing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 304–314. [Google Scholar] [CrossRef] [PubMed]
- Huiskes, M.J.; Lew, M.S. The MIR Flickr Retrieval Evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, Vancouver, BC, Canada, 30–31 October 2008; pp. 39–43. [Google Scholar]
- Chua, T.S.; Tang, J.; Hong, R.; Li, H.; Luo, Z.; Zheng, Y. NUS-WIDE: A Real-World Web Image Database From National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval, Santorini Island, Greece, 8–10 July 2009; pp. 1–9. [Google Scholar]
Dataset | Train | Query | Retrieval |
---|---|---|---|
MIRFlickr-25K | |||
NUS-Wide |
Image | Tag | Label/Class |
---|---|---|
MIR-Flickr25K example (1) | ||
bilbao, 11–16, cielo, sky, polarizado, reflejo, reflection, sanidad, estrenandoMiRegalito, geotagged, geo:lat = 43.260867, geo:lon = −2.935705, | clouds, sky, structures | |
NUS-Wide example (2) | ||
cute, nature, squirrel, funny, boxer, boxing, cuteness, coolest, pugnacious, peopleschoice, naturesfinest, blueribbonwinner, animalkingdomelite, mywinners, abigfave, superaplus aplusphoto, vimalvinayan, natureoutpost | Animal, Nature |
MIR-Flickr25K | NUS-WIDE | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Task | Method | Paired | 20% | 40% | 60% | 80% | 100% | Paired | 20% | 40% | 60% | 80% | 100% |
DADH | 0.836 | 0.807 | 0.789 | 0.750 | 0.702 | 0.562 | 0.701 | 0.690 | 0.683 | 0.656 | 0.646 | 0.297 | |
AGAH | 0.803 | 0.752 | 0.729 | 0.695 | 0.637 | 0.535 | 0.633 | 0.621 | 0.583 | 0.587 | 0.503 | 0.267 | |
JDSH | 0.672 | 0.653 | 0.648 | 0.643 | 0.619 | 0.555 | 0.546 | 0.534 | 0.510 | 0.457 | 0.402 | 0.253 | |
DADH | 0.823 | 0.824 | 0.814 | 0.812 | 0.796 | 0.552 | 0.707 | 0.706 | 0.702 | 0.670 | 0.634 | 0.261 | |
AGAH | 0.790 | 0.790 | 0.786 | 0.779 | 0.742 | 0.540 | 0.646 | 0.595 | 0.591 | 0.596 | 0.401 | 0.277 | |
JDSH | 0.660 | 0.672 | 0.666 | 0.652 | 0.632 | 0.564 | 0.566 | 0.499 | 0.476 | 0.452 | 0.412 | 0.256 |
MIR-Flickr25K | NUS-WIDE | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Task | Method | Paired | 20% | 40% | 60% | 80% | 100% | Paired | 20% | 40% | 60% | 80% | 100% |
DADH | 0.836 | 0.831 | 0.831 | 0.826 | 0.820 | 0.525 | 0.701 | 0.700 | 0.696 | 0.683 | 0.674 | 0.282 | |
AGAH | 0.803 | 0.755 | 0.740 | 0.720 | 0.682 | 0.541 | 0.633 | 0.597 | 0.566 | 0.500 | 0.356 | 0.267 | |
JDSH | 0.672 | 0.646 | 0.621 | 0.608 | 0.580 | 0.553 | 0.546 | 0.515 | 0.478 | 0.393 | 0.342 | 0.254 | |
DADH | 0.823 | 0.803 | 0.783 | 0.756 | 0.711 | 0.545 | 0.707 | 0.705 | 0.724 | 0.697 | 0.698 | 0.274 | |
AGAH | 0.790 | 0.760 | 0.744 | 0.698 | 0.642 | 0.535 | 0.646 | 0.645 | 0.653 | 0.651 | 0.464 | 0.267 | |
JDSH | 0.660 | 0.653 | 0.622 | 0.631 | 0.601 | 0.545 | 0.566 | 0.520 | 0.506 | 0.468 | 0.420 | 0.249 |
MIR-Flickr25K | NUS-WIDE | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Task | Method | Paired | UI: UT: | 10% 10% | 20% 20% | 30% 30% | 40% 40% | 50% 50% | Paired | UI: UT: | 10% 10% | 20% 20% | 30% 30% | 40% 40% | 50% 50% |
DADH | 0.836 | 0.820 | 0.822 | 0.752 | 0.728 | 0.760 | 0.701 | 0.696 | 0.676 | 0.663 | 0.676 | 0.662 | |||
AGAH | 0.803 | 0.741 | 0.737 | 0.664 | 0.673 | 0.693 | 0.633 | 0.642 | 0.637 | 0.561 | 0.564 | 0.567 | |||
JDSH | 0.672 | 0.652 | 0.643 | 0.609 | 0.610 | 0.591 | 0.546 | 0.547 | 0.503 | 0.398 | 0.306 | 0.259 | |||
DADH | 0.823 | 0.808 | 0.801 | 0.763 | 0.773 | 0.762 | 0.707 | 0.694 | 0.716 | 0.704 | 0.703 | 0.698 | |||
AGAH | 0.790 | 0.771 | 0.762 | 0.745 | 0.735 | 0.729 | 0.646 | 0.666 | 0.642 | 0.597 | 0.560 | 0.565 | |||
JDSH | 0.660 | 0.654 | 0.650 | 0.609 | 0.617 | 0.594 | 0.566 | 0.526 | 0.498 | 0.438 | 0.349 | 0.255 |
MIR-Flickr25K | NUS-WIDE | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Task | Method | Full | 20% | 40% | 60% | 80% | Random | Full | 20% | 40% | 60% | 80% | Random |
DADH | 0.836 | 0.824 | 0.799 | 0.779 | 0.744 | 0.543 | 0.701 | 0.683 | 0.648 | 0.610 | 0.575 | 0.260 | |
AGAH | 0.803 | 0.763 | 0.737 | 0.714 | 0.678 | 0.548 | 0.633 | 0.633 | 0.588 | 0.440 | 0.366 | 0.267 | |
JDSH | 0.672 | 0.657 | 0.655 | 0.640 | 0.634 | 0.551 | 0.546 | 0.543 | 0.523 | 0.469 | 0.457 | 0.256 | |
DADH | 0.823 | 0.807 | 0.797 | 0.781 | 0.754 | 0.537 | 0.707 | 0.672 | 0.663 | 0.630 | 0.549 | 0.258 | |
AGAH | 0.790 | 0.779 | 0.778 | 0.756 | 0.730 | 0.538 | 0.646 | 0.567 | 0.547 | 0.488 | 0.377 | 0.267 | |
JDSH | 0.660 | 0.669 | 0.654 | 0.654 | 0.644 | 0.559 | 0.566 | 0.514 | 0.517 | 0.487 | 0.424 | 0.245 |
MIR-Flickr25K | ||||||
---|---|---|---|---|---|---|
Task | Method | 20% | 40% | 60% | 80% | 100% |
DADH | UT (+0.86%) | UT (+3.97%) | UT (+6.02%) | UT (+10.16%) | UIT (+39.93%) | |
AGAH | SD | UT (+0.35%) | UT (+0.87%) | UT (+0.58%) | UIT (+26.43%) | |
JDSH | SD | SD | SD | SD | UIT (+7.26%) | |
DADH | UI (+2.02%) | UI (+2.16%) | UI (+4.04%) | UI (+5.57%) | UIT (+41.93%) | |
AGAH | UI (+1.52%) | UI (+0.95%) | UI (+2.98%) | UI (+1.67%) | UIT (+35.5%) | |
JDSH | UI (+0.45%) | UI (+1.83%) | SD | SD | UIT (+6.26%) | |
Both Tasks | DADH | UT (+0.19%) | UIT (+1.74%) | SD | UT (+2.20%) | UIT (+40.93%) |
AGAH | SD | SD | UI (+0.24%) | SD | UIT (+30.92%) | |
JDSH | SD | UI (+0.38%) | UI (+0.08%) | SD | UIT (+6.76%) | |
NUS-WIDE | ||||||
Task | Method | 20% | 40% | 60% | 80% | 100% |
DADH | UT (+2.52%) | UT (+7.49%) | UT (+11.98%) | UT (+17.12%) | UIT (+154.54%) | |
AGAH | UIT (+1.36%) | UIT (+8.46%) | UI (+33.40%) | UIT (+54.17%) | UIT (+112.43%) | |
JDSH | SD | SD | SD | SD | SD | |
DADH | UT (+5.09%) | UT (+9.15%) | UIT (+11.65%) | UIT (+28.05%) | UIT (+170.35%) | |
AGAH | UIT (+17.58%) | UT (+17.19%) | UT (+26.61%) | UT (+52.36%) | UIT (+111.42%) | |
JDSH | UT (+1.17%) | SD | SD | SD | SD | |
Both Tasks | DADH | UT (+3.70%) | UT (+8.33%) | UT (+11.25%) | UIT (+22.67%) | UIT (+162.41%) |
AGAH | UIT (+9.02%) | UIT (+12.75%) | UI (+27.49%) | UIT (+51.35%) | UIT (+111.92%) | |
JDSH | SD | SD | SD | SD | SD |
Fully Unpaired | MIR-Flickr25K | NUS-WIDE | ||
---|---|---|---|---|
AMSH [43] | 0.758 | 0.840 | 0.657 | 0.805 |
RUCMH [42] | 0.719 | 0.732 | 0.650 | 0.657 |
FlexCMH [44] | 0.572 | 0.568 | 0.426 | 0.418 |
DADH + UMML | 0.760 | 0.762 | 0.662 | 0.698 |
AGAH + UMML | 0.693 | 0.729 | 0.567 | 0.565 |
JDSH + UMML | 0.591 | 0.594 | 0.259 | 0.255 |
MIR-Flickr25K Classes | mAP | Performance Difference | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Paired | Image Unpair | Text Unpair | Image Unpair | Text Unpair | ||||||
1-Animals (271/2308) | 0.777 | 0.744 | 0.647 | 0.723 | 0.779 | 0.649 | −16.73% | −2.89% | 0.21% | −12.76% |
2-Baby (17/168) | 0.881 | 0.815 | 0.752 | 0.866 | 0.897 | 0.809 | −14.68% | 6.32% | 1.73% | −0.65% |
3-Bird (63/552) | 0.780 | 0.764 | 0.653 | 0.745 | 0.770 | 0.670 | −16.29% | −2.56% | −1.32% | −12.37% |
4-Car (90/926) | 0.879 | 0.869 | 0.800 | 0.861 | 0.896 | 0.818 | −8.90% | −0.96% | 1.99% | −5.91% |
5-Clouds (364/2883) | 0.901 | 0.906 | 0.784 | 0.859 | 0.897 | 0.812 | −12.98% | −5.24% | −0.38% | −10.37% |
6-Dog (58/508) | 0.791 | 0.755 | 0.648 | 0.714 | 0.785 | 0.656 | −17.99% | −5.44% | −0.76% | −13.02% |
7-Female (433/4243) | 0.894 | 0.879 | 0.783 | 0.863 | 0.892 | 0.801 | −12.43% | −1.79% | −0.20% | −8.87% |
8-Flower (223/1273) | 0.834 | 0.866 | 0.692 | 0.828 | 0.834 | 0.752 | −17.09% | −4.41% | −0.11% | −13.16% |
9-Food (73/747) | 0.734 | 0.692 | 0.562 | 0.707 | 0.760 | 0.589 | −23.36% | 2.12% | 3.53% | −14.97% |
10-Indoor (550/5899) | 0.836 | 0.791 | 0.667 | 0.795 | 0.844 | 0.687 | −20.18% | 0.61% | 0.98% | −13.15% |
11-Lake (27/609) | 0.873 | 0.866 | 0.758 | 0.836 | 0.879 | 0.779 | −13.14% | −3.54% | 0.65% | −10.03% |
12-Male (447/4375) | 0.899 | 0.878 | 0.785 | 0.862 | 0.886 | 0.802 | −12.64% | −1.76% | −1.39% | −8.66% |
13-Night (227/2078) | 0.850 | 0.841 | 0.720 | 0.820 | 0.836 | 0.749 | −15.24% | −2.57% | −1.57% | −11.01% |
14-People (769/7227) | 0.892 | 0.872 | 0.772 | 0.858 | 0.885 | 0.792 | −13.39% | −1.70% | −0.81% | −9.24% |
15- (728/6535) | 0.870 | 0.881 | 0.773 | 0.833 | 0.878 | 0.802 | −11.20% | −5.37% | 0.83% | −8.94% |
16-Portrait (292/2524) | 0.890 | 0.860 | 0.757 | 0.867 | 0.890 | 0.783 | −14.91% | 0.79% | 0.06% | −8.97% |
17-River (43/701) | 0.885 | 0.883 | 0.748 | 0.829 | 0.862 | 0.765 | −15.44% | −6.17% | −2.63% | −13.46% |
18-Sea (87/961) | 0.848 | 0.843 | 0.761 | 0.809 | 0.877 | 0.769 | −10.31% | −3.99% | 3.43% | −8.69% |
19-Sky (639/6020) | 0.895 | 0.900 | 0.790 | 0.851 | 0.891 | 0.812 | −11.74% | −5.41% | −0.43% | −9.75% |
20-Structures (779/7626) | 0.888 | 0.884 | 0.787 | 0.849 | 0.887 | 0.806 | −11.33% | −3.91% | −0.11% | −8.84% |
21-Sunset (215/1696) | 0.884 | 0.914 | 0.768 | 0.850 | 0.883 | 0.792 | −13.20% | −7.07% | −0.09% | −13.36% |
22-Transport (201/2219) | 0.877 | 0.875 | 0.777 | 0.820 | 0.878 | 0.790 | −11.48% | −6.28% | 0.11% | −9.78% |
23-Tree (342/3564) | 0.899 | 0.901 | 0.810 | 0.857 | 0.910 | 0.835 | −9.85% | −4.83% | 1.29% | −7.28% |
24-Water (271/2472) | 0.837 | 0.839 | 0.733 | 0.793 | 0.845 | 0.746 | −12.39% | −5.48% | 1.00% | −11.10% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Williams-Lekuona, M.; Cosma, G.; Phillips, I. A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval. J. Imaging 2022, 8, 328. https://doi.org/10.3390/jimaging8120328
Williams-Lekuona M, Cosma G, Phillips I. A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval. Journal of Imaging. 2022; 8(12):328. https://doi.org/10.3390/jimaging8120328
Chicago/Turabian StyleWilliams-Lekuona, Mikel, Georgina Cosma, and Iain Phillips. 2022. "A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval" Journal of Imaging 8, no. 12: 328. https://doi.org/10.3390/jimaging8120328
APA StyleWilliams-Lekuona, M., Cosma, G., & Phillips, I. (2022). A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval. Journal of Imaging, 8(12), 328. https://doi.org/10.3390/jimaging8120328