Deep medical cross-modal attention hashing

Yong Zhang¹,
Weihua Ou^1,2,
Yufeng Shi³,
Jiaxin Deng¹,
Xinge You³ &
…
Anzhi Wang¹

925 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Medical cross-modal retrieval aims to retrieve semantically similar medical instances across different modalities, such as retrieving X-ray images using radiology reports or retrieving radiology reports using X-ray images. The main challenge for medical cross-modal retrieval are the semantic gap and the small visual differences between different categories of medical images. To address those issues, we present a novel end-to-end deep hashing method, called Deep Medical Cross-Modal Attention Hashing (DMCAH), which extracts the global features utilizing global average pooling and local features by recurrent attention. Specifically, we recursively move from the coarse to fine-grained regions of images to locate discriminative regions more accurately, and recursively extract the discriminative semantic information of texts from the sentence level to the word level. Then, we select the discriminative features by aggregating the finer feature via adaptive attention. Finally, to reduce the semantic gap, we map images and reports features into a common space and obtain the discriminative hash codes. Comprehensive experimental results on large-scale medical dataset MIMIC-CXR and natural scene dataset MS-COCO show that DMCAH can achieve better performance than existing cross-modal hashing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Deep Discriminative Hashing for Cross-Modal Hashing Based Computer-Aided Diagnosis

Order-Sensitive Deep Hashing for Multimorbidity Medical Image Retrieval

Deep semantic hashing with dual attention for cross-modal retrieval

Article 12 November 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Medical Imaging

References

Gao, M., Xu, Z., Lu, L., Wu, A., Nogues, I., Summers, R.M., Mollura, D.J.: Segmentation label propagation using deep convolutional neural networks and dense conditional random field. In: 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), pages 1265–1268 (2016)
Zhang, J., Xie, Y., Xia, Y., Shen, C.: Attention residual learning for skin lesion classification. IEEE Trans. Med. Imaging. 38(9), 2092–2103 (2019)
Article Google Scholar
Lu, X., Zhu, L., Li, J., Zhang, H., Shen, H.T.: Efficient supervised discrete multi-view hashing for large-scale multimedia search. IEEE Trans. Multimedia. 22(8), 2048–2060 (2020)
Article Google Scholar
Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: International Conference on Multimedia (ACM), pages 154–162 (2017)
Rasiwasia, N., Pereira J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N. A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on Multimedia, pages 251–260 (2010)
Jiang, Q-Y, Li, W.-J.: Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3232–3240 (2017)
Li, C., Deng, C., Li, N., Liu, W., Gao, X., Tao, D.: Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4242–4251 (2018)
Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3594–3601 (2010)
Zhang, D., Li, W.-J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pages 2177–2183 (2014)
Wang, D., Gao, X., Wang, X., He, L.: Semantic topic multimodal hashing for cross-media retrieval. In: Proceedings of the 24th International Conference on Artificial Intelligence, pages 3890–3896 (2015)
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2075–2082 (5Y)
Lin, Z., Ding, G., Hu, M., Wang, J.: Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3864–3872 (2015)
Cao, Y., Long, M., Wang, J., Yang, Q., Yu, P.S.: Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1445–1454 (2016)
Yang, E., Deng, C., Liu, W., Liu, X., Tao, D., Gao, X.: Pairwise relationship guided deep hashing for cross-modal retrieval. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pages 1618–1625 (2017)
Zhang, X., Lai, H., Feng, J.: Attention-aware deep adversarial hashing for cross-modal retrieval. In: European Conference on Computer Vision, pages 614–629 (2018)
Peng, Y., Qi, J., Huang, X., Yuan, Y.: CCL: Cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Trans. Multimedia. 20(2), 405–420 (2017)
Article Google Scholar
Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: International Conference on Multimedia, pages 251–260 (2010)
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: Proceedings of the eleventh ACM international conference on Multimedia, pages 604–611 (2003)
Wei, Y., Zhao, Y., Lu, C., Wei, S., Liu, L., Zhu, Z., Yan, S.: Cross-modal retrieval with cnn visual features: A new baseline. IEEE Trans. Cybern. 47(2), 449–460 (2017)
Google Scholar
Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. IEEE International Conference on Computer Vision (ICCV), pages 2088–2095 (2013)
Cao, Y., Long, M., Wang, J., Yum, P.S.: Correlation hashing network for efficient cross-modal retrieval. arXiv e-prints, pages arXiv–1602 (2016)
Liu, X., Nie, X., Zeng, W., Cui, C., Zhu, L., Yin, Y.: Fast discrete cross-modal hashing with regressing from semantic labels. In: Proceedings of the 26th ACM international conference on Multimedia, pages 1662–1669 (2018)
Li, C., Peng-Fei, Z., Zhenduo, C., Luo, X., Nie, L., Zhang, W., Xu, X.-S.: Scratch: A scalable discrete matrix factorization hashing for cross-modal retrieval. In: Proceedings of the 26th ACM international conference on Multimedia, pages 1–9 (2018)
Hu, D., Nie, F., Li, X.: Deep binary reconstruction for cross-modal hashing. IEEE Trans. Multimedia. 21(4), 973–985 (2018)
Article Google Scholar
Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 785–796 (2013)
Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on Multimedia, pages 143–152 (2013)
Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 415–424 (2014)
Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. In: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Two, pages 1360–1365 (2011)
Wu, B., Yang, Q., Zheng, W.-S., Wang, Y., Wang, J.: Quantized correlation hashing for fast cross-modal search. In: Proceedings of the 24th International Conference on Artificial Intelligence, pages 3946–3952 (2015)
Zhen Y., Yeung, D.-Y.: Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pages 1376–1384 (2012)
Xiong, H., Ou, W., Yan, Z., Gou, J., Zhou, Q., Wang, A.: Modality-specific matrix factorization hashing for cross-modal retrieval. J. Ambient. Intell. Humanized Comput. 1–15 (2020)
Lu, H., Zhang, M., Xu, X., Li, Y., Shen, H.T.: Deep fuzzy hashing network for efficient image retrieval. IEEE Trans. Fuzzy Syst. 29(1), 1 (2020)
Google Scholar
Lu, H., Li, Y., Chen, M., Kim, H., Serikawa, S.: Brain intelligence: Go beyond artificial intelligence. Mob. Netw. Appl. 23, 368–375 (2017)
Article Google Scholar
Cao, Y., Liu, B., Long, M., Wang, J.: Cross-modal hamming hashing. In: Proceedings of the European Conference on Computer Vision (ECCV), pages 202–218 (2018)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets. arXiv e-prints, pages arXiv–1405 (2014)
Lin, M., Chen, Q., Yan, S.: Network in network. arXiv e-prints, pages arXiv–1312 (2013)
Simonyan K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv e-prints, pages arXiv–1409 (2014)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125 (2017)
Hochreiter S., Schmidhuber, J.: Long short-term memory. Neural Comput. 1735–1780 (1997)
Xu, X., Wang, T., Yang, Y., Zuo, L., Shen, F., Shen, H.T.: Cross-modal attention with semantic consistence for image–text matching. IEEE Trans. Neural Netwo. Learn. Syst. 31(12), 5412–5425 (2020)
Article Google Scholar
Zhang, M., Li, J., Zhang, H., Liu, L.: Deep semantic cross modal hashing with correlation alignment. Neurocomputing. 381(14), 240–251 (2020)
Article Google Scholar
Johnson, A.E.W., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., Deng, C.-y., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., Horng, S.: Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv e-prints, pages arXiv–1901 (2019)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C. L.: Microsoft coco: Common objects in context. In: European conference on computer vision, pages 740–755 (2014)

Download references

Acknowledgements

Weihua Ou is the corresponding author. This work was supported by the National Natural Science Foundation of China (No.61762021), Excellent Young Scientific and Technological Talent of Guizhou Provincial Science and Technology Foundation ([2019]-5670), Special Project of Guizhou Normal University in 2019 on Cultivation and Innovation Exploration of Academic Talents.

Author information

Authors and Affiliations

School of Big Data and Computer Science, School of Mathematics and Sciences, Guizhou Normal University, Guiyang, China
Yong Zhang, Weihua Ou, Jiaxin Deng & Anzhi Wang
Special Key Laboratory of Artificial Intelligence and Intelligent Control of Guizhou Province, Guiyang, People’s Republic of China
Weihua Ou
School of Computer Science and Telecommunication Engineering, Huazhong University of Science and Technology, Wuhan, People’s Republic of China
Yufeng Shi & Xinge You

Authors

Yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weihua Ou
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxin Deng
View author publications
You can also search for this author in PubMed Google Scholar
Xinge You
View author publications
You can also search for this author in PubMed Google Scholar
Anzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weihua Ou.

Additional information

This article belongs to the Topical Collection: Special Issue on Synthetic Media on the Web

Guest Editors: Huimin Lu, Xing Xu, Jože Guna, and Gautam Srivastava

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(AUX 10 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Ou, W., Shi, Y. et al. Deep medical cross-modal attention hashing. World Wide Web 25, 1519–1536 (2022). https://doi.org/10.1007/s11280-021-00881-8

Download citation

Received: 26 January 2021
Revised: 15 March 2021
Accepted: 31 March 2021
Published: 15 June 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11280-021-00881-8

Deep medical cross-modal attention hashing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Discriminative Hashing for Cross-Modal Hashing Based Computer-Aided Diagnosis

Order-Sensitive Deep Hashing for Multimorbidity Medical Image Retrieval

Deep semantic hashing with dual attention for cross-modal retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Deep medical cross-modal attention hashing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Discriminative Hashing for Cross-Modal Hashing Based Computer-Aided Diagnosis

Order-Sensitive Deep Hashing for Multimorbidity Medical Image Retrieval

Deep semantic hashing with dual attention for cross-modal retrieval

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation