Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer

Quan Zhou¹,
Shaozhuang Ye¹,
Mingwei Wen¹,
Zhiwen Huang¹,
Mingyue Ding¹ &
…
Xuming Zhang ORCID: orcid.org/0000-0003-4332-071X¹

1378 Accesses
21 Citations
Explore all metrics

Abstract

Multi-modal medical image fusion (MMIF) has found wide application in the field of disease diagnosis and surgical guidance. Despite the popularity of deep learning (DL)-based fusion methods, these DL algorithms cannot provide satisfactory fusion performance due to the difficulty in capturing the local information and the long-range dependencies effectively. To address these issues, this paper has presented an unsupervised MMIF method by combining a densely-connected high-resolution network (DHRNet) with a hybrid transformer. In this method, the local features are firstly extracted from the source image using the DHRNet. Then these features are input into the fine-grained attention module in the hybrid transformer to produce the global features by exploring their long-range dependencies. The local and global features are fused by the projection attention module in the hybrid transformer. Finally, based on the fused features, the fused result is reconstructed by the decoder network. The presented network is trained using an unsupervised loss function including edge preservation value, structural similarity, sum of the correlations of differences and structural tensor. Experiments on various multi-modal medical images show that, compared with several traditional and DL-based fusion methods, the presented method can generate visually better fused results and provide better quantitative metrics values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

End-to-end dynamic residual focal transformer network for multimodal medical image fusion

Article 17 April 2024

MBRARN: multibranch residual attention reconstruction network for medical image fusion

Article 25 August 2023

RTFusion: A Multimodal Fusion Network with Significant Information Enhancement

Article 10 April 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Yang Y, Que Y, Huang S, Lin P (2016) Multimodal sensor medical Image fusion based on Type-2 fuzzy logic in NSCT domain. IEEE Sens J 16:3735–3745
Article Google Scholar
Daneshzand M, Zoroofi RA, Faezipour M (2014) MR image assisted drug delivery in respiratory tract and trachea tissues based on an enhanced level set method. In: Proceedings of the 2014 zone 1 conference of the American Society for engineering education. IEEE, pp 1–7
James AP, Dasarathy BV (2014) Medical image fusion: a survey of the state of the art. Inf Fusion 19:4–19
Article Google Scholar
Zong J, Qiu T (2017) Medical image fusion based on sparse representation of classified image patches. Biomed Signal Process Control 34:195–205
Article Google Scholar
Mihaylova L, Faouzi E, Klein L (2012) Sensor and data fusion: taxonomy, challenges and applications. Handbook on soft computing for video surveillance. Chapman and Hall/CRC, Boca Raton, pp 155–200
Google Scholar
Du J, Li W, Xiao B (2017) Anatomical-Functional image fusion by information of interest in local Laplacian filtering domain. IEEE Trans Image Process 26:5855–5866
Article MathSciNet Google Scholar
Zhu Z, Zheng M, Qi G et al (2019) A phase congruency and local laplacian energy based multi-modality medical image fusion method in NSCT domain. IEEE Access 7:20811–20824
Article Google Scholar
Liu Y, Liu S, Wang Z (2015) A general framework for image fusion based on multi-scale transform and sparse representation. Inf Fusion 24:147–164
Article Google Scholar
Burt P, Adelson E (1983) The Laplacian pyramid as a compact image code. IEEE Trans Commun 31:532–540
Article Google Scholar
Yang S, Wang M, Jiao L et al (2010) Image fusion based on a new contourlet packet. Inf Fusion 11:78–84
Article Google Scholar
Da Cunha AL, Zhou J, Do MN (2006) The nonsubsampled contourlet transform: theory, design, and applications. IEEE Trans Image Process 15:3089–3101
Article Google Scholar
Candès E, Demanet L, Donoho D, Ying L (2006) Fast discrete curvelet transforms. Multiscale Model Simul 5:861–899
Article MathSciNet MATH Google Scholar
Singh R, Khare A (2014) Fusion of multimodal medical images using daubechies complex wavelet transform—a multiresolution approach. Inf Fusion 19:49–60
Article Google Scholar
Yu B, Jia B, Ding L et al (2016) Hybrid dual-tree complex wavelet transform and support vector machine for digital multi-focus image fusion. Neurocomputing 182:1–9
Article Google Scholar
Li X, Zhou F, Tan H et al (2021) Multi-focus image fusion based on nonsubsampled contourlet transform and residual removal. Signal Process 184:108062
Article Google Scholar
Liu Z, Song Y, Sheng VS et al (2019) MRI and PET image fusion using the nonparametric density model and the theory of variable-weight. Comput Methods Programs Biomed 175:73–82
Article Google Scholar
Wang S, Meng J, Zhou Y et al (2021) Polarization image fusion algorithm using NSCT and CNN. J Russ Laser Res 42:443–452
Article Google Scholar
Shreyamsha Kumar BK (2015) Image fusion based on pixel significance using cross bilateral filter. Signal Image Video Process 9:1193–1204
Article Google Scholar
Li S, Kang X, Jianwen Hu (2013) Image fusion with guided filtering. IEEE Trans Image Process 22:2864–2875
Article Google Scholar
Liu Z, Chai Y, Yin H et al (2017) A novel multi-focus image fusion approach based on image decomposition. Inf Fusion 35:102–116
Article Google Scholar
Yin H (2011) Multimodal image fusion with joint sparsity model. Opt Eng 50:067007
Article Google Scholar
Li S, Yin H, Fang L (2012) Group-sparse representation with dictionary learning for medical image denoising and fusion. IEEE Trans Biomed Eng 59:3450–3459
Article Google Scholar
Yin H (2015) Sparse representation with learned multiscale dictionary for image fusion. Neurocomputing 148:600–610
Article Google Scholar
Hou R, Zhou D, Nie R et al (2020) VIF-Net: an unsupervised framework for infrared and visible image fusion. IEEE Trans Comput Imaging 6:640–651
Article Google Scholar
Li H, Wu X (2019) DenseFuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28:2614–2623
Article MathSciNet Google Scholar
Li H, Wu X, Kittler J (2021) RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf Fusion 73:72–86
Article Google Scholar
Raza A, Liu J, Liu Y et al (2021) IR-MSDNet: infrared and visible image fusion based on infrared features and multiscale dense network. IEEE J Sel Top Appl Earth Obs Remote Sens 14:3426–3437
Article Google Scholar
Ma J, Liang P, Yu W et al (2020) Infrared and visible image fusion via detail preserving adversarial learning. Inf Fusion 54:85–98
Article Google Scholar
Ma B, Zhu Y, Yin X et al (2021) SESF-Fuse: an unsupervised deep model for multi-focus image fusion. Neural Comput Appl 33:5793–5804
Article Google Scholar
Mustafa HT, Yang J, Zareapoor M (2019) Multi-scale convolutional neural network for multi-focus image fusion. Image Vis Comput 85:26–35
Article Google Scholar
Gai D, Shen X, Chen H, Su P (2020) Multi-focus image fusion method based on two stage of convolutional neural network. Signal Process 176:107681
Article Google Scholar
Wang Z, Chen B, Lu R et al (2020) FusionNet: an unsupervised convolutional variational network for hyperspectral and multispectral image fusion. IEEE Trans Image Process 29:7565–7577
Article MATH Google Scholar
Liu Y, Chen X, Cheng J, Peng H (2017) A medical image fusion method based on convolutional neural networks. In: 2017 20th international conference on information fusion (fusion). IEEE, pp 1–7
Liang X, Hu P, Zhang L et al (2019) MCFNet: multi-layer concatenation fusion network for medical images fusion. IEEE Sens J 19:7107–7119
Article Google Scholar
Zhang Y, Liu Y, Sun P et al (2020) IFCNN: a general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118
Article Google Scholar
Xu H, Ma J, Jiang J et al (2022) U2Fusion: a unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44:502–518
Article Google Scholar
Fu J, Li W, Du J, Huang Y (2021) A multiscale residual pyramid attention network for medical image fusion. Biomed Signal Process Control 66:102488
Article Google Scholar
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 2017:5999–6009
Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2021) An image is worth 16×16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Yu C, Xiao B, Gao C, et al (2021) Lite-HRNet: a lightweight high-resolution network. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, pp 10435–10445
Yuan L, Hou Q, Jiang Z, Feng J, Yan S (2021) VOLO: vision outlooker for visual recognition. In: Proceedings of the IEEE conference computer vision pattern recognition (CVPR). IEEE, arXiv preprint arXiv:2106.13112
Liu F, Ren X, Zhang Z, et al (2020) Rethinking skip connection with layer normalization. In: Proceedings of the 28th international conference on computational linguistics. International Committee on Computational Linguistics, Stroudsburg, pp 3586–3598
Hu J, Shen L, Albanie S et al (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42:2011–2023
Article Google Scholar
Aslantas V, Bendes E (2015) A new image quality metric for image fusion: the sum of the correlations of differences. AEU - Int J Electron Commun 69:1890–1896
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13:600–612
Article Google Scholar
Xydeas CS, Petrović V (2000) Objective image fusion performance measure. Electron Lett 36:308
Article Google Scholar
Jung H, Kim Y, Jang H et al (2020) Unsupervised deep image fusion with structure tensor representations. IEEE Trans Image Process 29:3845–3858
Article MATH Google Scholar
http://www.med.harvard.edu/AANLIB/
Han Y, Cai Y, Cao Y, Xu X (2013) A new image fusion performance metric based on visual information fidelity. Inf Fusion 14:127–135
Article Google Scholar
Eskicioglu AM, Fisher PS (1995) Image quality measures and their performance. IEEE Trans Commun 43:2959–2965
Article Google Scholar
Othonos A (1997) Fiber bragg gratings. Rev Sci Instrum 68:4309–4341
Article Google Scholar
Petrovic V, Xydeas C (2005) Objective image fusion performance characterisation. In: Tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE, pp 1866–1871
Haghighat M, Razian MA (2014) Fast-FMI: non-reference image fusion metric. In: 2014 IEEE 8th international conference on application of information and communication technologies (AICT). IEEE, pp 1–3
Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 618–626
Yin M, Liu X, Liu Y, Chen X (2019) Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain. IEEE Trans Instrum Meas 68:49–64
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) (Grant No.: 61871440). The authors also acknowledge the support by the medical ultrasound lab at Huazhong University of Science and Technology for providing GPU computation platform.

Author information

Authors and Affiliations

Department of Biomedical Engineering, College of Life Science and Technology, Ministry of Education Key Laboratory of Molecular Biophysics, Huazhong University of Science and Technology, No 1037, Luoyu Road, Wuhan, China
Quan Zhou, Shaozhuang Ye, Mingwei Wen, Zhiwen Huang, Mingyue Ding & Xuming Zhang

Authors

Quan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shaozhuang Ye
View author publications
You can also search for this author in PubMed Google Scholar
Mingwei Wen
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Mingyue Ding
View author publications
You can also search for this author in PubMed Google Scholar
Xuming Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuming Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, Q., Ye, S., Wen, M. et al. Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer. Neural Comput & Applic 34, 21741–21761 (2022). https://doi.org/10.1007/s00521-022-07635-1

Download citation

Received: 08 February 2022
Accepted: 11 July 2022
Published: 29 July 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00521-022-07635-1

Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-end dynamic residual focal transformer network for multimodal medical image fusion

MBRARN: multibranch residual attention reconstruction network for medical image fusion

RTFusion: A Multimodal Fusion Network with Significant Information Enhancement

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-end dynamic residual focal transformer network for multimodal medical image fusion

MBRARN: multibranch residual attention reconstruction network for medical image fusion

RTFusion: A Multimodal Fusion Network with Significant Information Enhancement

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation